Eryk Sun added the comment:

Serhiy, single-byte codepages map every byte value, even if it's just to a 
Unicode C1 control code [1]. 

For example:

    import ctypes
    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    MB_ERR_INVALID_CHARS = 0x00000008

    def mbtwc_errcheck(result, func, args):
        if not result and args[-1]:
            raise ctypes.WinError(ctypes.get_last_error())
        return args

    kernel32.MultiByteToWideChar.errcheck = mbtwc_errcheck

    def decode(codepage, data, strict=True):
        flags = MB_ERR_INVALID_CHARS if strict else 0
        n = kernel32.MultiByteToWideChar(codepage, flags,
                                         data, len(data),
                                         None, 0)
        buf = (ctypes.c_wchar * n)()
        kernel32.MultiByteToWideChar(codepage, flags,
                                     data, len(data),
                                     buf, n)
        return buf.value


    codepages = [437, 874] + list(range(1250, 1259))
    for cp in codepages:
        print('cp%d:' % cp, ascii(decode(cp, b'\x81\x8d')))

Output:
    
    cp437: '\xfc\xec'
    cp874: '\x81\x8d'
    cp1250: '\x81\u0164'
    cp1251: '\u0403\u040c'
    cp1252: '\x81\x8d'
    cp1253: '\x81\x8d'
    cp1254: '\x81\x8d'
    cp1255: '\x81\x8d'
    cp1256: '\u067e\u0686'
    cp1257: '\x81\xa8'
    cp1258: '\x81\x8d'

[1]: https://en.wikipedia.org/wiki/C0_and_C1_control_codes

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28712>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to