New submission from Michael Felt <mich...@felt.demon.nl>:
The test fails because byte_str.decode('ascii', 'surragateescape') is not what ascii(byte_str) - returns when called from the commandline. Assumption: since " check('utf8', [arg_utf8])" succeeds I assume the parsing of the command-line is correct. DETAILS >>> arg = 'h\xe9\u20ac'.encode('utf-8') >>> arg b'h\xc3\xa9\xe2\x82\xac' >>> arg.decode('ascii', 'surrogateescape') 'h\udcc3\udca9\udce2\udc82\udcac' I am having a difficult time getting the syntax correct for all the "escapes", so I added a print statement in the check routine: test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ... code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac' out:UTF-8:['h\xe9\u20ac'] code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac' out:ISO8859-1:['h\xc3\xa9\xe2\x82\xac'] test code with my debug statement (to generate above): def test_cmd_line(self): arg = 'h\xe9\u20ac'.encode('utf-8') arg_utf8 = arg.decode('utf-8') arg_ascii = arg.decode('ascii', 'surrogateescape') code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))' def check(utf8_opt, expected, **kw): out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw) print("\ncode:%s arg:%s\nout:%s" % (code, arg, out)) args = out.partition(':')[2].rstrip() self.assertEqual(args, ascii(expected), out) check('utf8', [arg_utf8]) if sys.platform == 'darwin' or support.is_android: c_arg = arg_utf8 else: c_arg = arg_ascii check('utf8=0', [c_arg], LC_ALL='C') So the first check succeeds: check('utf8', [arg_utf8]) But the second does not: FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 225, in test_cmd_line check('utf8=0', [c_arg], LC_ALL='C') File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 218, in check self.assertEqual(args, ascii(expected), out) AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']" - ['h\xc3\xa9\xe2\x82\xac'] + ['h\udcc3\udca9\udce2\udc82\udcac'] : ISO8859-1:['h\xc3\xa9\xe2\x82\xac'] I tried saying the "expected" is arg, but arg is still a byte object, the cmd_line result is not (printed as such). AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "[b'h\\xc3\\xa9\\xe2\\x82\\xac']" - ['h\xc3\xa9\xe2\x82\xac'] + [b'h\xc3\xa9\xe2\x82\xac'] ? + : ISO8859-1:['h\xc3\xa9\xe2\x82\xac'] ---------- components: Interpreter Core, Tests messages: 323214 nosy: Michael.Felt priority: normal severity: normal status: open title: AIX: test_utf8_mode.test_cmd_line fails type: behavior versions: Python 3.7, Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34347> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com