Eryk Sun added the comment:
Serhiy, single-byte codepages map every byte value, even if it's just to a
Unicode C1 control code [1].
For example:
import ctypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
MB_ERR_INVALID_CHARS = 0x00000008
def mbtwc_errcheck(result, func, args):
if not result and args[-1]:
raise ctypes.WinError(ctypes.get_last_error())
return args
kernel32.MultiByteToWideChar.errcheck = mbtwc_errcheck
def decode(codepage, data, strict=True):
flags = MB_ERR_INVALID_CHARS if strict else 0
n = kernel32.MultiByteToWideChar(codepage, flags,
data, len(data),
None, 0)
buf = (ctypes.c_wchar * n)()
kernel32.MultiByteToWideChar(codepage, flags,
data, len(data),
buf, n)
return buf.value
codepages = [437, 874] + list(range(1250, 1259))
for cp in codepages:
print('cp%d:' % cp, ascii(decode(cp, b'\x81\x8d')))
Output:
cp437: '\xfc\xec'
cp874: '\x81\x8d'
cp1250: '\x81\u0164'
cp1251: '\u0403\u040c'
cp1252: '\x81\x8d'
cp1253: '\x81\x8d'
cp1254: '\x81\x8d'
cp1255: '\x81\x8d'
cp1256: '\u067e\u0686'
cp1257: '\x81\xa8'
cp1258: '\x81\x8d'
[1]: https://en.wikipedia.org/wiki/C0_and_C1_control_codes
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue28712>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com