Martin v. Löwis added the comment:

I stand by that comment: IsWhiteSpace should use the Unicode White_Space 
property. Since FS/GS/RS/US are not in the White_Space property, it's correct 
that the int conversion fails. It's incorrect that .isspace() gives true.

There are really several bugs here:
- .isspace doesn't use the White_List property
- int conversion ultimately uses Py_ISSPACE, which conceptually could deviate 
from the Unicode properties (as it is byte-based). This is not really an issue, 
since they indeed match.

I propose to fix this by parsing PropList.txt, and generating 
_PyUnicode_IsWhitespace based on the White_Space property. For efficiency, it 
should also generate a fast-lookup array for the ASCII case, or just use 
_Py_ctype_table (with a comment that this table needs to match PropList 
White_Space). _Py_ascii_whitespace should go.

Contributions are welcome.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18236>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to