New submission from Rafael Belo <rafaelblsi...@gmail.com>:

There is a mismatch in specification and behavior in some windows encodings.

Some older windows codepages specifications present "UNDEFINED" mapping, 
whereas in reality, they present another behavior which is updated in a section 
named "bestfit".

For example CP1252 has a corresponding bestfit1525: 
CP1252: 
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
bestfit1525: 
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt


>From which, in CP1252, bytes \x81 \x8d \x8f \x90 \x9d map to "UNDEFINED", 
>whereas in bestfit1252, they map to \u0081 \u008d \u008f \u0090 \u009d 
>respectively. 

In the Windows API, the function 'MultiByteToWideChar' exhibits the bestfit1252 
behavior.


This issue and PR proposes a correction for this behavior, updating the windows 
codepages where some code points where defined as "UNDEFINED" to the 
corresponding bestfit mapping. 


Related issue: https://bugs.python.org/issue28712

----------
components: Demos and Tools, Library (Lib), Unicode, Windows
messages: 401181
nosy: ezio.melotti, lemburg, paul.moore, rafaelblsilva, steve.dower, 
tim.golden, vstinner, zach.ware
priority: normal
severity: normal
status: open
title: Windows cp encodings "UNDEFINED" entries update
type: behavior

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45120>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to