New submission from Rafael Belo <rafaelblsi...@gmail.com>: There is a mismatch in specification and behavior in some windows encodings.
Some older windows codepages specifications present "UNDEFINED" mapping, whereas in reality, they present another behavior which is updated in a section named "bestfit". For example CP1252 has a corresponding bestfit1525: CP1252: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT bestfit1525: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt >From which, in CP1252, bytes \x81 \x8d \x8f \x90 \x9d map to "UNDEFINED", >whereas in bestfit1252, they map to \u0081 \u008d \u008f \u0090 \u009d >respectively. In the Windows API, the function 'MultiByteToWideChar' exhibits the bestfit1252 behavior. This issue and PR proposes a correction for this behavior, updating the windows codepages where some code points where defined as "UNDEFINED" to the corresponding bestfit mapping. Related issue: https://bugs.python.org/issue28712 ---------- components: Demos and Tools, Library (Lib), Unicode, Windows messages: 401181 nosy: ezio.melotti, lemburg, paul.moore, rafaelblsilva, steve.dower, tim.golden, vstinner, zach.ware priority: normal severity: normal status: open title: Windows cp encodings "UNDEFINED" entries update type: behavior _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45120> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com