[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset b11507395ce4 by Serhiy Storchaka in branch '3.3':
Add tests for issue #18183.
http://hg.python.org/cpython/rev/b11507395ce4

New changeset 17c9f1627baf by Serhiy Storchaka in branch 'default':
Add tests for issue #18183.
http://hg.python.org/cpython/rev/17c9f1627baf

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Dave Challis

New submission from Dave Challis:

This occurred when attempting to decode invalid UTF-8 bytes using 
errors='replace', then attempting to lowercase the produced unicode string.

This was also tested in python 2.7, but it doesn't occur there.

Code to reproduce:

x = 
b'\xe2\xb3\x99\xb3\xd1\x9f\xe0vjGd|\x12\xf2\x84\xac\xae$\xa4\xae+\xa4sbtf$fG\xfb\xe6?.\xe2sbv\x14\xcb\x89\x98\xda\xd9\x99\xda\xb9d9\x1bY\x99\xb7\xb3\x1b9\xa2y*B\xa3\xba\xefjg\xe2\x92Et\x85~\xbf\x8a\xe3\x919\x8bvc\xfb#$$.\xber6Db.#4\xa4.\x13RtI\x10\xed\x9c\xd0\x98\xb8\x18\x91\x99\\\nC\x13\x8dV\xccL\xf4\x89\x9c\x90'

x = x.decode('utf-8', errors='replace')

x.lower()


Output:
Traceback (most recent call last):
  File stdin, line 1, in module
SystemError: invalid maximum character passed to PyUnicode_New

--
components: Unicode
messages: 190907
nosy: davechallis, ezio.melotti
priority: normal
severity: normal
status: open
title: Calling .lower() on certain unicode string raises SystemError
type: behavior
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
components: +Interpreter Core
nosy: +serhiy.storchaka
stage:  - needs patch
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Minimal example:

 '\U0001\U0010'.lower()
Traceback (most recent call last):
  File stdin, line 1, in module
SystemError: invalid maximum character passed to PyUnicode_New

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It happens due to use fast MAX_MAXCHAR() which can produce maxchar out of range 
(0x1 | 0x10  MAX_UNICODE).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

 a = chr(0x84b2e)+chr(0x109710)
 a.lower()
SystemError: invalid maximum character passed to PyUnicode_New

The MAX_MAXCHAR() macro only works for 'maxchar' values, like 0xff, 0x...  
in do_upper_or_lower() it's used with arbitrary UCS4 values.

--
nosy: +amaury.forgeotdarc, haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 89b106d298a9 by Benjamin Peterson in branch '3.3':
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value 
(see #18183)
http://hg.python.org/cpython/rev/89b106d298a9

New changeset 668aba845fb2 by Benjamin Peterson in branch 'default':
merge 3.3 (#18183)
http://hg.python.org/cpython/rev/668aba845fb2

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Benjamin Peterson

Benjamin Peterson added the comment:

I simply removed the MAX_MAXCHAR micro-optimization, since it seems fairly 
unsafe. Interested parties can restore it safely if they wish.

--
nosy: +benjamin.peterson
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread STINNER Victor

STINNER Victor added the comment:

Oops, my MAX_MAXCHAR macro was too optimized :-) (the result is incorrect)

It shows us that the test suite does not have enough test on non-BMP characters.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here are additional tests for this issue.

--
keywords: +patch
stage: needs patch - patch review
status: closed - open
Added file: http://bugs.python.org/file30533/test_issue18183.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread STINNER Victor

STINNER Victor added the comment:

+'\U0001\U0010'.lower()

Why not checking the result of these calls?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18183] Calling .lower() on certain unicode string raises SystemError

2013-06-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The result is trivial. Is not checking the result distract an attention from 
the main issue?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com