On Fri, Dec 3, 2010 at 12:10 AM, Alexander Belopolsky <alexander.belopol...@gmail.com> wrote: .. > I don't think decimal module should support non-European decimal > digits. The only place where it can make some sense is in int() > because here we have a fighting chance of producing a reasonable > definition. The motivating use case is conversion of numerical data > extracted from text using simple '\d+' regex matches. >
It turns out, this use case does not quite work in Python either: >>> re.compile(r'\s+(\d+)\s+').match(' \u2081\u2082\u2083 ').group(1) '₁₂₃' >>> int(_) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'decimal' codec can't encode character '\u2081' in position 0: invalid decimal Unicode string This may actually be a bug in Python regex implementation because Unicode standard seems to recommend that '\d' be interpreted as gc = Decimal_Number (Nd): http://unicode.org/reports/tr18/#Compatibility_Properties I actually wonder if Python's re module can claim to provide even Basic Unicode Support. http://unicode.org/reports/tr18/#Basic_Unicode_Support > Here is how I would do it: > > 1. String x of non-European decimal digits is only accepted in > int(x), but not by int(x, 0) or int(x, 10). > 2. If x contains one or more non-European digits, then > > (a) all digits must be from the same block: > > def basepoint(c): > return ord(c) - unicodedata.digit(c) > all(basepoint(c) == basepoint(x[0]) for c in x) -> True > > (b) and '+' or '-' sign is not alowed. > > 3. A character c is a digit if it matches '\d' regex. I think this > means unicodedata.category(c) -> 'Nd'. > > Condition 2(b) is important because there is no clear way to define > what is acceptable as '+' or '-' using Unicode character properties > and not all number systems even support local form of negation. (It > is also YAGNI.) > _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com