Ma Lin added the comment:
>> I examined all Chinese codecs
I said it above, but I forgot Taiwan and HongKong are using Chinese as well.
BIG5 and CP950 are using a wrong convert table, test this:
>>> u = b'\xC6\xA1'.decode('big5')
>>> hex(ord(u))
'0x30fe'
This should not happen, 0xC6A1 is neither in BIG5 nor in CP950.
In BIG5-2003 and HKSCS-2008, 0xC6A1 is mapped to U+2460.
I only had a look roughly, please check more.
I won't check HongKong codec anymore, I suggest check it as well.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24117>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com