Q: a simple(?) raw-utf-8 conversion to internal type unicode \304\246\311\231\316\257\316\271\303\222

2006-12-31 Thread NevilleDNZ
Hi,

Apologies first as I am not a unicode expert indeed I the details
probably totally elude me.  Not withstanding:  how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported=\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220
print English/ASCII quoting:,''+imported+'',SUCCEEDS :-) # xterm
encoding if UTF8
print German/ALCOR quoting:,u\N{runic cross punctuation}+test
+\N{runic cross punctuation},AOK :-)
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},FAILS :-(

$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
  File ./uc.py, line 5, in module
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},FAILS :-(
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii imported characters
are 8 bit encoded UTF-8 and dont know it! How do I tell imported that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ

-- 
http://mail.python.org/mailman/listinfo/python-list

Just TOO easy.... Re: Q: a simple(?) raw-utf-8 conversion to internal type unicode \304\246\311\231\316\257\316\271\303\222

2006-12-31 Thread NevilleDNZ
It was just TOO easy... on posting my message to google groups, and
when I re-read the posting on groups I found that google had pointed me
to a python-unicode tutorial...
www.reportlab.com/i18n/python_unicode_tutorial.html - exercise one :-)

Gosh sometime a google is worth so much more then ₁₀¹⁰⁰!

Happy New Year
NevilleD

It works now:
$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ FAILS :-(
[EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ vi ./uc.py
[EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ cat ./uc.py
#!/usr/bin/env python
imported=unicode(\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220,utf-8)
print English/ASCII quoting:,''+imported+'',SUCCEEDS :-) # xterm
encoding if UTF8
print German/ALCOR quoting:,u\N{runic cross punctuation}test\N{runic
cross punctuation},AOK :-)
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},Just TOO easy
:-)

$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ Just TOO easy :-)

NevilleDNZ wrote:
 Hi,

 Apologies first as I am not a unicode expert indeed I the details
 probably totally elude me.  Not withstanding:  how can I convert a
 binary string containing UTF-8 binary into a python unicode string?

 cutdown example:
 $ cat ./uc.py
 #!/usr/bin/env python
 imported=\304\246\311\231\316\257\316\271\303\222
 \317\216\317\203\305\224\304\271\304\220
 print English/ASCII quoting:,''+imported+'',SUCCEEDS :-) # xterm
 encoding if UTF8
 print German/ALCOR quoting:,u\N{runic cross punctuation}+test
 +\N{runic cross punctuation},AOK :-)
 print German/ALCOR quoting:,u\N{runic cross
 punctuation}+imported+u\N{runic cross punctuation},FAILS :-(

 $ ./uc.py
 English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
 German/ALCOR quoting: ᛭test᛭ AOK :-)
 German/ALCOR quoting:
 Traceback (most recent call last):
   File ./uc.py, line 5, in module
 print German/ALCOR quoting:,u\N{runic cross
 punctuation}+imported+u\N{runic cross punctuation},FAILS :-(
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
 ordinal not in range(128)

 The last print statement fails because the ascii imported characters
 are 8 bit encoded UTF-8 and dont know it! How do I tell imported that
 it is actually already UTF-8 unicode?
 
 Cheers
 NevilleDNZ

-- 
http://mail.python.org/mailman/listinfo/python-list