STINNER Victor <[email protected]> added the comment:
> File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 214, in check
> self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xa7\\xe9']" != "['h\\xcf\\xd5']"
> - ['h\xa7\xe9']
> + ['h\xcf\xd5']
> : roman8:['h\xa7\xe9']
Hum, it looks like a bug in the C library of HP-UX. It announces that the
locale encoding is "roman8", but the mbstowcs() function decodes from the
Latin1 encoding. The updated test uses the byte string: b'h\xa7\xe9'. The OS
announces the encoding roman8, so the test expects the Unicode string:
b'h\xa7\xe9'.decode('roman8') == 'h\xcf\xd5'.... but it gets 'h\xa7\xe9' which
looks more like the byte string has been decoded from Latin1:
b'h\xa7\xe9'.decode('latin1') == 'h\xa7\xe9'.
Michael: would you mind to compile and run the attached c_locale.c test
program? It sets the LC_ALL locale to C, dump locales (LC_ALL, LC_CTYPE,
nl_langinfo(CODESET)), and then decode all bytes from the locale encoding
(LC_CTYPE). The output should help me to understand what is the *effective*
encoding of HP-UX for the C locale.
You may modify the c_locale.c to replace "C" with "POSIX", to see if the
behaviour is different.
----------
Added file: https://bugs.python.org/file47767/c_locale.c
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com