Ezio Melotti <ezio.melo...@gmail.com> added the comment: str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise." The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is returned and the character is not stripped.
OTOH, Unicode defines the White_Space property and assigns it to 26 chars, whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that should probably be removed. I'll close this issue because str.strip() is correct regarding U+200B. @Martin Do you think those 4 chars should be removed? If so I'll open another issue. ---------- assignee: -> ezio.melotti nosy: +loewis resolution: -> invalid stage: -> committed/rejected status: open -> closed _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13391> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com