[issue18183] Calling .lower() on certain unicode string raises SystemError
Roundup Robot added the comment: New changeset b11507395ce4 by Serhiy Storchaka in branch '3.3': Add tests for issue #18183. http://hg.python.org/cpython/rev/b11507395ce4 New changeset 17c9f1627baf by Serhiy Storchaka in branch 'default': Add tests for issue #18183. http://hg.python.org/cpython/rev/17c9f1627baf -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
New submission from Dave Challis: This occurred when attempting to decode invalid UTF-8 bytes using errors='replace', then attempting to lowercase the produced unicode string. This was also tested in python 2.7, but it doesn't occur there. Code to reproduce: x = b'\xe2\xb3\x99\xb3\xd1\x9f\xe0vjGd|\x12\xf2\x84\xac\xae$\xa4\xae+\xa4sbtf$fG\xfb\xe6?.\xe2sbv\x14\xcb\x89\x98\xda\xd9\x99\xda\xb9d9\x1bY\x99\xb7\xb3\x1b9\xa2y*B\xa3\xba\xefjg\xe2\x92Et\x85~\xbf\x8a\xe3\x919\x8bvc\xfb#$$.\xber6Db.#4\xa4.\x13RtI\x10\xed\x9c\xd0\x98\xb8\x18\x91\x99\\\nC\x13\x8dV\xccL\xf4\x89\x9c\x90' x = x.decode('utf-8', errors='replace') x.lower() Output: Traceback (most recent call last): File stdin, line 1, in module SystemError: invalid maximum character passed to PyUnicode_New -- components: Unicode messages: 190907 nosy: davechallis, ezio.melotti priority: normal severity: normal status: open title: Calling .lower() on certain unicode string raises SystemError type: behavior versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Changes by Serhiy Storchaka storch...@gmail.com: -- components: +Interpreter Core nosy: +serhiy.storchaka stage: - needs patch versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Serhiy Storchaka added the comment: Minimal example: '\U0001\U0010'.lower() Traceback (most recent call last): File stdin, line 1, in module SystemError: invalid maximum character passed to PyUnicode_New -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Serhiy Storchaka added the comment: It happens due to use fast MAX_MAXCHAR() which can produce maxchar out of range (0x1 | 0x10 MAX_UNICODE). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Amaury Forgeot d'Arc added the comment: a = chr(0x84b2e)+chr(0x109710) a.lower() SystemError: invalid maximum character passed to PyUnicode_New The MAX_MAXCHAR() macro only works for 'maxchar' values, like 0xff, 0x... in do_upper_or_lower() it's used with arbitrary UCS4 values. -- nosy: +amaury.forgeotdarc, haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Roundup Robot added the comment: New changeset 89b106d298a9 by Benjamin Peterson in branch '3.3': remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183) http://hg.python.org/cpython/rev/89b106d298a9 New changeset 668aba845fb2 by Benjamin Peterson in branch 'default': merge 3.3 (#18183) http://hg.python.org/cpython/rev/668aba845fb2 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Benjamin Peterson added the comment: I simply removed the MAX_MAXCHAR micro-optimization, since it seems fairly unsafe. Interested parties can restore it safely if they wish. -- nosy: +benjamin.peterson resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
STINNER Victor added the comment: Oops, my MAX_MAXCHAR macro was too optimized :-) (the result is incorrect) It shows us that the test suite does not have enough test on non-BMP characters. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Serhiy Storchaka added the comment: Here are additional tests for this issue. -- keywords: +patch stage: needs patch - patch review status: closed - open Added file: http://bugs.python.org/file30533/test_issue18183.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
STINNER Victor added the comment: +'\U0001\U0010'.lower() Why not checking the result of these calls? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18183] Calling .lower() on certain unicode string raises SystemError
Serhiy Storchaka added the comment: The result is trivial. Is not checking the result distract an attention from the main issue? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com