[EMAIL PROTECTED] wrote:
uhm ... then there is a misprint in the discussion of the recipe; BTW what's the difference between .encode and .decode ? (yes, I have been living in happy ASCII-land until now ... ;)
# -*- coding: latin-1 -*-
# here i make a unicode string unicode_file = u'Some danish characters æøå' #.encode('hex') print type(unicode_file) print repr(unicode_file) print ''
# I can convert this unicode string to an ordinary string. # because æøå are in the latin-1 charmap it can be understood as # a latin-1 string # the æøå characters even has the same value in both latin1_file = unicode_file.encode('latin-1') print type(latin1_file) print repr(latin1_file) print latin1_file print ''
## I can *not* convert it to ascii #ascii_file = unicode_file.encode('ascii') #print ''
# I can also convert it to utf-8 utf8_file = unicode_file.encode('utf-8') print type(utf8_file) print repr(utf8_file) print utf8_file print ''
#utf8_file is now an ordinary string. again it can help to think of it as a file
#format.
#
#I can convert this file/string back to unicode again by using the decode method.
#It tells python to decode this "file format" as utf-8 when it loads it onto a
#unicode string. And we are back where we started
unicode_file = utf8_file.decode('utf-8') print type(unicode_file) print repr(unicode_file) print ''
# So basically you can encode a unicode string into a special string/file format
# and you can decode a string from a special string/file format back into unicode.
###################################
<type 'unicode'> u'Some danish characters \xe6\xf8\xe5'
<type 'str'> 'Some danish characters \xe6\xf8\xe5' Some danish characters æøå
<type 'str'> 'Some danish characters \xc3\xa6\xc3\xb8\xc3\xa5' Some danish characters æøå
<type 'unicode'> u'Some danish characters \xe6\xf8\xe5'
--
hilsen/regards Max M, Denmark
http://www.mxm.dk/ IT's Mad Science -- http://mail.python.org/mailman/listinfo/python-list