On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky <alexander.belopol...@gmail.com> wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support. A fix for issue10557 would be much > simpler if we did not support non-European digits. I now added a > patch that handles non-ascii digits, so you can see what's involved. > Note that when Unicode Consortium inevitably adds more Nd characters > to the non-BMP planes, we will have to add surrogate pairs' support to > this code. >
It turns out that this did in fact happen: # Newly assigned in Unicode 3.1.0 (March, 2001) .. 1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE See http://unicode.org/Public/UNIDATA/DerivedAge.txt And of course, >>> unicodedata.digit('\U0001D7CE') 0 but >>> int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com