<anuraguni...@yahoo.com> wrote in message news:994147fb-cdf3-4c55-8dc5-62d769b12...@u9g2000pre.googlegroups.com...
Sorry being unclear again, hmm I am becoming an expert in it.

I pasted that code as continuation of my old code at start
i.e
 class A(object):
     def __unicode__(self):
         return u"©au"

     def __repr__(self):
         return unicode(self).encode("utf-8")
     __str__ = __repr__

doesn't work means throws unicode error
my question boils down to
what is diff between, why one doesn't throws error and another does
print unicode(a)
vs
print unicode([a])

That is still an incomplete example. Your results depend on your source code's encoding and your system's stdout encoding. Assuming a=A(), unicode(a) returns u'©au', but then is converted to stdout's encoding for display. An encoding such as cp437 (U.S. Windows console) will fail. the repr of [a] is a byte string in the encoding of your source file. The unicode() function, given a byte string of unspecified encoding, uses the ASCII codec. Assuming your source encoding was utf-8, unicode([a],'utf-8') will correctly convert it to unicode, and then printing that unicode string will attempt to convert it to stdout encoding. On a utf-8 console, it will work, on a cp437 console it will not.

Here's a new one:

In PythonWin (from pywin32-313), stdout is utf-8, so:

print '©'  # this is a utf8 byte string
©
'©'  # view the utf8 bytes
'\xc2\xa9'
u'©'  # view the unicode character
u'\xa9'
print '\xc2\xa9'  # stdout is utf8, so it is understood
©
print u'\xa9'  # auto-converts to utf8.
©
print unicode('\xc2\xa9')  # encoding not given, defaults to ASCII.
Traceback (most recent call last):
 File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
print unicode('\xc2\xa9','utf8')  # provide the encoding
©

This gives different results when the stdout encoding is different. Here's a couple of the same instructions on my Windows console with cp437 encoding, which doesn't support the copyright character:

print '\xc2\xa9' # stdout is cp437
©
print u'\xa9'  # tries to convert to cp437
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
   return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 0: character maps to <undefined>

Hope that helps your understanding,
Mark



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to