New submission from Nadeem Vawda <nadeem.va...@gmail.com>: I've recently come across a strange failure in the tests for the input() built-in function:
$ ./python -E -m test -v test_readline test_builtin [... snip ...] ====================================================================== FAIL: test_input_tty_non_ascii (test.test_builtin.BuiltinTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1079, in test_input_tty_non_ascii self.check_input_tty("prompté", b"quux\xe9", "utf-8") File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty self.assertEqual(input_result, expected) AssertionError: 'quux' != 'quux\udce9' - quux + quux\udce9 ? + ====================================================================== FAIL: test_input_tty_non_ascii_unicode_errors (test.test_builtin.BuiltinTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1083, in test_input_tty_non_ascii_unicode_errors self.check_input_tty("prompté", b"quux\xe9", "ascii") File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty self.assertEqual(input_result, expected) AssertionError: 'quux' != 'quux\udce9' - quux + quux\udce9 ? + The failure only manifests itself if the readline module is loaded before test_builtin runs (hence the presence of test_readline above). It will not occur if regrtest is run with either of the -j or -W flags (which is why it hasn't been seen on the buildbots). The problem seems to be that readline assumes that its input should use the locale encoding, and silently strips out any undecodable chars. This breaks the tests mentioned above, since they set up sys.stdin to use the surrogateescape error handler, expecting invalid characters to be escaped rather than discarded. This problem doesn't crop up if readline is *not* loaded, because in that case PyOS_Readline() falls back to a stdio-based implementation (PyOS_StdioReadline()) that preserves invalid characters, allowing them to be handled properly by sys.stdin's encoding and error handler. I have been able to fix the test failures with the attached patch, which stops readline from eating invalid characters, making it consistent with the stdio-based fallback. Can someone with more knowledge of readline and/or locale issues advise whether the change is a good idea? ---------- components: Extension Modules files: rl-locale.diff keywords: patch messages: 152080 nosy: nadeem.vawda priority: normal severity: normal stage: patch review status: open title: readline-related test_builtin failure type: behavior versions: Python 3.2, Python 3.3 Added file: http://bugs.python.org/file24337/rl-locale.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13886> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com