STINNER Victor <vstin...@redhat.com> added the comment:

>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 214, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xa7\\xe9']" != "['h\\xcf\\xd5']"
> - ['h\xa7\xe9']
> + ['h\xcf\xd5']
>  : roman8:['h\xa7\xe9']

Hum, it looks like a bug in the C library of HP-UX. It announces that the 
locale encoding is "roman8", but the mbstowcs() function decodes from the 
Latin1 encoding. The updated test uses the byte string: b'h\xa7\xe9'. The OS 
announces the encoding roman8, so the test expects the Unicode string: 
b'h\xa7\xe9'.decode('roman8') == 'h\xcf\xd5'.... but it gets 'h\xa7\xe9' which 
looks more like the byte string has been decoded from Latin1: 
b'h\xa7\xe9'.decode('latin1') == 'h\xa7\xe9'.

Michael: would you mind to compile and run the attached c_locale.c test 
program? It sets the LC_ALL locale to C, dump locales (LC_ALL, LC_CTYPE, 
nl_langinfo(CODESET)), and then decode all bytes from the locale encoding 
(LC_CTYPE). The output should help me to understand what is the *effective* 
encoding of HP-UX for the C locale.

You may modify the c_locale.c to replace "C" with "POSIX", to see if the 
behaviour is different.

----------
Added file: https://bugs.python.org/file47767/c_locale.c

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to