New submission from Mingye Wang: Python's cp950 implementation lacks support for HKSCS ('big5hkscs'). This support, which maps HKSCS Big5-EUDC code points to Unicode PUA code points algorithmically, is found in Windows Vista+ as well as an update for XP.
An experiment session is shown below. I will use '2>>>' to denote a Win32 build of Python 2.7.10 running under a console window set to cp950 (via chcp), and '3>>>' to denote a Python 3.4.3 build running under Cygwin's UTF-8 mintty. HKSCS-2008's table is used http://www.ogcio.gov.hk/en/business/tech_promotion/ccli/terms/doc/hkscs-2008-big5-iso.txt for a list of HKSCS characters; note though, its non-PUA mappings are not found in Windows. Let's start with the first character in that list. 3>>> u'\u43F0' '䏰' 3>>> print(u'\uF266') # provisional PUA 3>>> u'\u43F0'.encode('cp950') # FAIL 3>>> u'\uF266'.encode('cp950') # FAIL 3>>> u'\u43F0'.encode('hkscs') b'\x87@' 3>>> u'\uF266'.encode('hkscs') # FAIL` These experiments above show how Python 3 handles HKSCS characters, and how U+43F0 should normally be encoded. Now let's switch to Windows console, which would be using Windows' decode-to-Unicode routine for cp950. 2>>> print b'\x87@' Let's try to identify this character: 3>>> u'' '\uf266' So indeed there is some sort of HKSCS going on. But note what Windows has is really not any kind of new HKSCS: > Big5 ucs93 ucs00 ucs03 + 1-6 > 876B 9734 9734 9734 > 876C F292 F292 27BEF > 876D 5BDB 5BDB 5BDB 2>>> print b'\x87\x6b,\x87\x6c,\x87\x6d' ,, 3>>> u',,' '\uf291,\uf292,\uf293' Just as for all other code pages, you can always find Microsoft's mapping at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt. If you are uncomfortable with adding a whole new table and wasting space (this is done for hkscs btw), use the algorithmic mapping at https://en.wikipedia.org/wiki/Code_page_950. ---------- components: Unicode messages: 280811 nosy: Artoria2e5, ezio.melotti, haypo priority: normal severity: normal status: open title: No HKSCS support in Windows cp950 type: behavior versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28693> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com