New submission from Mark Dickinson <dicki...@gmail.com>: In Python 3, or in Python 2 with the re.UNICODE flag, it appears that the regex r'\d' matches all unicode characters with category either 'Nd' (Number, Decimal Digit) or 'No' (Number, Other), but not characters in category 'Nl' (Number, Letter):
Python 3.2a0 (py3k:74188, Jul 23 2009, 16:01:29) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> import unicodedata >>> x = '\u2781' >>> unicodedata.category(x) 'No' >>> unicodedata.name(x) 'DINGBAT CIRCLED SANS-SERIF DIGIT TWO' >>> re.match(r'\d', '\u2781') <_sre.SRE_Match object at 0x3d5d08> I believe (but am not 100% sure) that r'\d' should only match characters in category 'Nd'. To back up this belief: (1) int and float currently accept characters in category 'Nd' but not 'No'; it would seem useful for '\d' to match those characters that are accepted by int, so that e.g., something matched with '\d+' could be directly passed to int. (This came up in a #python-dev discussion about whether the Decimal type should accept other unicode digits; that's a separate issue, though.) (2) In Perl 5.10 (and possibly some earlier versions too), '\d' matches only characters in category 'Nd' (3) Unicode Technical Standard #18 ("Unicode Regular Expressions") at http://unicode.org/unicode/reports/tr18/ recommends that '\d' should correspond to \p{gc=Decimal_Number} Marc-André, do you have any opinion on this? It's probably slightly dangerous to change this in 2.6 or 3.1; I'm proposing that '\d' should be modified to accept only characters of category 'Nd' in 2.7 and 3.2. (Thanks Ezio Melotti for finding all the references above and doing Perl testing!) ---------- components: Extension Modules messages: 90878 nosy: ezio.melotti, lemburg, marketdickinson severity: normal stage: test needed status: open title: Regex '\d' should not match unicode category 'No'. type: behavior versions: Python 2.7, Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6561> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com