Serhiy Storchaka added the comment:

It seems to me there is something wrong with your test. For example decoding 
b'\x81\x8d' from CP1251 (as well from any other codepage!) gives you 
u'\x81\x8d', but codes 0x81 and 0x8D are assigned to different characters: 'Ѓ' 
(U+0402) and 'Ќ' (U+040C).

0x81    0x0403  #CYRILLIC CAPITAL LETTER GJE
0x8D    0x040C  #CYRILLIC CAPITAL LETTER KJE

[1] https://en.wikipedia.org/wiki/Windows-1251
[2] http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
[3] 
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1251.txt

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28712>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to