New submission from Ezio Melotti <ezio.melo...@gmail.com>: On Python3, strptime raises a ValueError with some "Unicode whitespaces" even if they are present both in the 'string' and 'format' args in the same position: >>> strptime("Thu\x20Feb", "%a\x20%b") # normal space, works fine time.struct_time(tm_year=1900, tm_mon=2, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=32, tm_isdst=-1) >>> strptime("Thu\xa0Feb", "%a\xa0%b") # no-break space, fails ValueError: time data 'Thu\xa0Feb' does not match format '%a\xa0%b'
I wrote a small script to find out other chars where it fails (it needs ~5 minutes to run): >>> l = [] >>> for char in map(chr, range(0xFFFF)): ... try: x = strptime('Thu{0}Feb'.format(char), '%a{0}%b'.format(char)) ... except ValueError: l.append(char) ... >>> l ['\x1c', '\x1d', '\x1e', '\x1f', '%', '\x85', '\xa0', '\u1680', '\u2000', '\u2001', '\u2002', '\u2003', '\u2004', '\u2005', '\u2006', '\u2007', '\u2008', '\u2009', '\u200a', '\u200b', '\u2028', '\u2029', '\u202f', '\u205f', '\u3000'] >>> [char.strip() for char in l] ['', '', '', '', '%', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] >>> [unicodedata.category(char) for char in l] ['Cc', 'Cc', 'Cc', 'Cc', 'Po', 'Cc', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Zs', 'Cf', 'Zl', 'Zp', 'Zs', 'Zs', 'Zs'] >>> [unicodedata.name(char, '???') for char in l] ['???', '???', '???', '???', 'PERCENT SIGN', '???', 'NO-BREAK SPACE', 'OGHAM SPACE MARK', 'EN QUAD', 'EM QUAD', 'EN SPACE', 'EM SPACE', 'THREE-PER-EM SPACE', 'FOUR-PER-EM SPACE', 'SIX-PER-EM SPACE', 'FIGURE SPACE', 'PUNCTUATION SPACE', 'THIN SPACE', 'HAIR SPACE', 'ZERO WIDTH SPACE', 'LINE SEPARATOR', 'PARAGRAPH SEPARATOR', 'NARROW NO-BREAK SPACE', 'MEDIUM MATHEMATICAL SPACE', 'IDEOGRAPHIC SPACE'] All these chars (except % and some control chars) are whitespace and they are removed by the .strip() method, so I guess that something similar happens in strptime too. The Unicode categories are: "Cc" = "Other, Control" "Zs" = "Separator, Space" "Cf" = "Other, Format" "Zl" = "Separator, Line" "Zp" = "Separator, Paragraph" Everything seems to work fine on Py2.x (tested on 2.4 and 2.6) ---------- components: Library (Lib), Unicode messages: 81859 nosy: ezio.melotti severity: normal status: open title: time.strptime fails to match data and format with Unicode whitespaces (Py3) type: behavior versions: Python 3.0, Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5240> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com