[issue28693] No HKSCS support in Windows cp950

2016-11-18 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
versions:  -Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28693] No HKSCS support in Windows cp950

2016-11-16 Thread Mingye Wang

Mingye Wang added the comment:

Update: the test script at issue28712 can be modified to show this issue too.

--
components: +Windows
nosy: +paul.moore, steve.dower, tim.golden, zach.ware

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28693] No HKSCS support in Windows cp950

2016-11-15 Thread STINNER Victor

STINNER Victor added the comment:

Python supports native Windows code pages using
codecs.code_page_encode() and codecs.code_page_decode() methods. See
for example Lib/encodings/cp65001.py : this codec is not implemented
in Python, but is a wrapper to native Windows functions
(MultiByteToWideChar and WideCharToMultiByte).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28693] No HKSCS support in Windows cp950

2016-11-14 Thread Mingye Wang

New submission from Mingye Wang:

Python's cp950 implementation lacks support for HKSCS ('big5hkscs'). This 
support, which maps HKSCS Big5-EUDC code points to Unicode PUA code points 
algorithmically, is found in Windows Vista+ as well as an update for XP.

An experiment session is shown below. I will use '2>>>' to denote a Win32 build 
of Python 2.7.10 running under a console window set to cp950 (via chcp), and 
'3>>>' to denote a Python 3.4.3 build running under Cygwin's UTF-8 mintty. 
HKSCS-2008's table is used  
http://www.ogcio.gov.hk/en/business/tech_promotion/ccli/terms/doc/hkscs-2008-big5-iso.txt
 for a list of HKSCS characters; note though, its non-PUA mappings are not 
found in Windows.

Let's start with the first character in that list.

3>>> u'\u43F0'
'䏰'
3>>> print(u'\uF266') # provisional PUA

3>>> u'\u43F0'.encode('cp950') # FAIL
3>>> u'\uF266'.encode('cp950') # FAIL
3>>> u'\u43F0'.encode('hkscs')
b'\x87@'
3>>> u'\uF266'.encode('hkscs') # FAIL`

These experiments above show how Python 3 handles HKSCS characters, and how 
U+43F0 should normally be encoded. Now let's switch to Windows console, which 
would be using Windows' decode-to-Unicode routine for cp950.

2>>> print b'\x87@'


Let's try to identify this character:

3>>> u''
'\uf266'

So indeed there is some sort of HKSCS going on. But note what Windows has is 
really not any kind of new HKSCS:

> Big5   ucs93  ucs00   ucs03 + 1-6
> 876B   9734   97349734
> 876C   F292   F292   27BEF
> 876D   5BDB   5BDB5BDB

2>>> print b'\x87\x6b,\x87\x6c,\x87\x6d'
,,
3>>> u',,'
'\uf291,\uf292,\uf293'

Just as for all other code pages, you can always find Microsoft's mapping at 
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt.
 If you are uncomfortable with adding a whole new table and wasting space (this 
is done for hkscs btw), use the algorithmic mapping at 
https://en.wikipedia.org/wiki/Code_page_950.

--
components: Unicode
messages: 280811
nosy: Artoria2e5, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: No HKSCS support in Windows cp950
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com