Re: unicode bit me

Mark Tolonen Sat, 09 May 2009 11:07:53 -0700

<[email protected]> wrote in messagenews:[email protected]...

Sorry being unclear again, hmm I am becoming an expert in it.


I pasted that code as continuation of my old code at start
i.e
 class A(object):
     def __unicode__(self):
         return u"©au"

     def __repr__(self):
         return unicode(self).encode("utf-8")
     __str__ = __repr__

doesn't work means throws unicode error
my question boils down to
what is diff between, why one doesn't throws error and another does
print unicode(a)
vs
print unicode([a])

That is still an incomplete example. Your results depend on your sourcecode's encoding and your system's stdout encoding. Assuming a=A(),unicode(a) returns u'©au', but then is converted to stdout's encoding fordisplay. An encoding such as cp437 (U.S. Windows console) will fail. therepr of [a] is a byte string in the encoding of your source file. Theunicode() function, given a byte string of unspecified encoding, uses theASCII codec. Assuming your source encoding was utf-8, unicode([a],'utf-8')will correctly convert it to unicode, and then printing that unicode stringwill attempt to convert it to stdout encoding. On a utf-8 console, it willwork, on a cp437 console it will not.


Here's a new one:

In PythonWin (from pywin32-313), stdout is utf-8, so:

print '©'  # this is a utf8 byte string

©

'©'  # view the utf8 bytes

'\xc2\xa9'

u'©'  # view the unicode character

u'\xa9'

print '\xc2\xa9'  # stdout is utf8, so it is understood

©

print u'\xa9'  # auto-converts to utf8.

©

print unicode('\xc2\xa9')  # encoding not given, defaults to ASCII.

Traceback (most recent call last):
 File "<interactive input>", line 1, in <module>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:ordinal not in range(128)

print unicode('\xc2\xa9','utf8')  # provide the encoding

©

This gives different results when the stdout encoding is different. Here'sa couple of the same instructions on my Windows console with cp437 encoding,which doesn't support the copyright character:

print '\xc2\xa9' # stdout is cp437

┬⌐

print u'\xa9'  # tries to convert to cp437

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
   return codecs.charmap_encode(input,errors,encoding_map)

UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' inposition 0: character maps to <undefined>


Hope that helps your understanding,
Mark



--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode bit me

Reply via email to