Ezio Melotti <ezio.melo...@gmail.com> added the comment:

str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace 
(see Objects/unicodetype_db.h#l3347), and according to the comment there it 
"Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 
'S' or the category 'Zs', 0 otherwise."
The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is 
returned and the character is not stripped.

OTOH, Unicode defines the White_Space property and assigns it to 26 chars, 
whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that 
should probably be removed.

I'll close this issue because str.strip() is correct regarding U+200B.

@Martin
Do you think those 4 chars should be removed?
If so I'll open another issue.

----------
assignee:  -> ezio.melotti
nosy: +loewis
resolution:  -> invalid
stage:  -> committed/rejected
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13391>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to