New submission from STINNER Victor <victor.stin...@haypocalc.com>: I added a test in _PyUnicode_CheckConsistency() (in debug mode) to ensure that all characters of a string are in the range U+0000-U+10FFFF. Locale tests are now failing on Solaris:
----------------------------------- [ 28/361] test__locale Assertion failed: maxchar <= 0x10FFFF, file Objects/unicodeobject.c, line 408 Fatal Python error: Aborted Current thread 0x00000001: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134 in test_float_parsing File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 139 in test_main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module> File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main *** Error code 134 ----------------------------------- The problem is that strxfrm() and wcsxfrm() return strange results for the string "a" and the english locale (e.g. en_US.UTF-8). strxfrm(buffer, "a\0", 100) returns 21 (bytes) but only 2 bytes are written ("\x01\x00"). The next bytes are unchanged. wcsxfrm(buffer, L"a\0", 100) returns 7 (characters), the 7 characters are written but they are in range U+1010101..U+1010163, whereas the maximum character of Unicode 6.0 is U+10FFFF (U+101xxxx vs U+10xxxx). Output of the attached program, strxfrm.c, on OpenSolaris: ----------------------------------- strxfrm: len=21 0x01 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff wcsxfrm: len=7 U+1010163 U+1010101 U+1010103 U+1010101 U+1010103 U+1010101 U+1010101 ----------------------------------- I don't know if it's normal that wcsxfrm() writes characters in the range U+1010101..U+1010163. Is Python supposed to support characters outside U+0000-U+10FFFF range? chr(0x10FFFF+1) raises a ValueError. ---------- components: Unicode files: strxfrm.c messages: 148017 nosy: ezio.melotti, haypo, loewis, pitrou priority: normal severity: normal status: open title: TestEnUSCollation.test_strxfrm() fails on Solaris versions: Python 3.3 Added file: http://bugs.python.org/file23741/strxfrm.c _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13441> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com