Ma Lin added the comment:

>> I examined all Chinese codecs
I said it above, but I forgot Taiwan and HongKong are using Chinese as well.

BIG5 and CP950 are using a wrong convert table, test this:
>>> u = b'\xC6\xA1'.decode('big5')
>>> hex(ord(u))
'0x30fe'

This should not happen, 0xC6A1 is neither in BIG5 nor in CP950.
In BIG5-2003 and HKSCS-2008, 0xC6A1 is mapped to U+2460.

I only had a look roughly, please check more.
I won't check HongKong codec anymore, I suggest check it as well.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24117>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to