Python 3.0 and repr

Mark Tolonen Sun, 28 Sep 2008 10:16:15 -0700

I don't understand the behavior of the interpreter in Python 3.0. I amworking at a command prompt in Windows (US English), which has a terminalencoding of cp437.


In Python 2.5:

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit(Intel)] on win

   32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> x=u'\u5000'
   >>> x
   u'\u5000'

In Python 3.0:

Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit(Intel)]

   on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> x='\u5000'
   >>> x
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "c:\dev\python30\lib\io.py", line 1486, in write
       b = encoder.encode(s)
     File "c:\dev\python30\lib\encodings\cp437.py", line 19, in encode
       return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u5000' inposition

   1: character maps to <undefined>

Where I would have expected

   >>> x
   '\u5000'

Shouldn't a repr() of x work regardless of output encoding?  Another test:

Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit(Intel)]

   on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> bytes(range(256)).decode('cp437')
   
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\

x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&\'()*+,-./0123456789:;<=>[EMAIL PROTECTED]

   
DEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7fÇüéâäàåçêëèïîìÄÅ
   
ÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñÑªº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀
   αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■\xa0'
   >>> bytes(range(256)).decode('cp437')[255]
   '\xa0'

Characters that cannot be displayed in cp437 are being escaped, such as0x00-0x1F, 0x7F, and 0xA0. Even if I incorrectly decode a value, if thecharacter exists in cp437, it is displayed:


   >>> bytes(range(256)).decode('latin-1')[255]
   'ÿ'

However, for a character that isn't supported by cp437, incorrectly decoded:

   >>> bytes(range(256)).decode('cp437')[254]
   '■'
   >>> bytes(range(256)).decode('latin-1')[254]
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "c:\dev\python30\lib\io.py", line 1486, in write
       b = encoder.encode(s)
     File "c:\dev\python30\lib\encodings\cp437.py", line 19, in encode
       return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' inposition 1:

    character maps to <undefined>

Why not display '\xfe' here? It seems like this inconsistency would make itdifficult to write things like doctests that weren't dependent on thetester's terminal. It also makes it difficult to inspect variables withouthex(ord(n)) on a character-by-character basis. Maybe repr() should alwaysdisplay the ASCII representation with escapes for all other characters,especially considering the "repr() should produce output suitable for eval()when possible" rule.


What are others' opinions?  Any insight to this design decision?

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Python 3.0 and repr

Reply via email to