Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:

I am submitting a patch (issue10557b.diff) for commit review.  As Marc 
suggested, decimal conversion is now performed on Py_UNICODE characters. For 
this purpose, I introduced _PyUnicode_NormalizeDecimal() function that takes 
Py_UNICODE and returns a PyUnicode object with whitespace stripped and 
non-ASCII digits converted to ASCII equivalents.  The PyUnicode_EncodeDecimal() 
function is no longer used and I added a comment recommending that 
_PyUnicode_NormalizeDecimal() be used instead. I would like to eventually 
remove PyUnicode_EncodeDecimal(), but I amd not sure about the proper 
deprecation procedures for undocumented C APIs.

As a result, int(), float(), etc will no longer raise UnicodeDecodeError unless 
given a string with lone surrogates.  (This error comes from UTF-8 codec that 
is applied after digit normalization.)

A few error cases such as embedded '\0' and non-digit characters with ord(c) > 
255 will now raise ValueError instead of UnicodeDecodeError.  Since 
UnicodeDecodeError is a subclass of ValueError, it is unlikely that existing 
code would attempt to differentiate between the two.  It is possible to achieve 
complete compatibility, but it is hard to justify reporting different error 
types on non-digit characters below and above code point 255.

The patch contains tests for error messages that I tried to make robust by only 
requiring that s.strip() be found somewhere in the error message from int(s).  
Note that since in this patch whitespace is stripped before the string is 
passed to the parser, the parser errors do not contain the whitespace.  This 
may actually be desirable because it helps the user to see the source of the 
error without being distracted by irrelevant white space.

----------
assignee:  -> belopolsky
stage: unit test needed -> commit review

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to