[issue10557] Malformed error message from float()

Marc-Andre Lemburg Thu, 02 Dec 2010 13:40:58 -0800

Marc-Andre Lemburg <m...@egenix.com> added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
> 
> I am submitting a patch (issue10557b.diff) for commit review.  As Marc 
> suggested, decimal conversion is now performed on Py_UNICODE characters. For 
> this purpose, I introduced _PyUnicode_NormalizeDecimal() function that takes 
> Py_UNICODE and returns a PyUnicode object with whitespace stripped and 
> non-ASCII digits converted to ASCII equivalents.  The 
> PyUnicode_EncodeDecimal() function is no longer used and I added a comment 
> recommending that _PyUnicode_NormalizeDecimal() be used instead. I would like 
> to eventually remove PyUnicode_EncodeDecimal(), but I amd not sure about the 
> proper deprecation procedures for undocumented C APIs.
> 
> As a result, int(), float(), etc will no longer raise UnicodeDecodeError 
> unless given a string with lone surrogates.  (This error comes from UTF-8 
> codec that is applied after digit normalization.)
> 
> A few error cases such as embedded '\0' and non-digit characters with ord(c) 
> > 255 will now raise ValueError instead of UnicodeDecodeError.  Since 
> UnicodeDecodeError is a subclass of ValueError, it is unlikely that existing 
> code would attempt to differentiate between the two.  It is possible to 
> achieve complete compatibility, but it is hard to justify reporting different 
> error types on non-digit characters below and above code point 255.
> 
> The patch contains tests for error messages that I tried to make robust by 
> only requiring that s.strip() be found somewhere in the error message from 
> int(s).  Note that since in this patch whitespace is stripped before the 
> string is passed to the parser, the parser errors do not contain the 
> whitespace.  This may actually be desirable because it helps the user to see 
> the source of the error without being distracted by irrelevant white space.


Thanks for the patch. I've had a quick look...

Some comments:

 * Please change the API _PyUnicode_NormalizeDecimal() to
   PyUnicode_ConvertToASCIIDecimal() - that's closer to what
   it does.

 * Don't have the API remove any whitespace. It should just
   work on decimal digit code points (chainging the length
   of the Unicode string is a bad idea).

 * Please remove the note "This function is no longer used.
   Use _PyUnicode_NormalizeDecimal instead." from the
   PyUnicode_EncodeDecimal() API description in the
   header file. The API won't go away (it does have its
   use and is being used in 3rd party extensions) and
   you cannot guide people to use a Python private API.

 * Please double check the ref counts. I think you have a leak
   in PyLong_FromUnicode() (for norm) and possible in other
   functions as well.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10557] Malformed error message from float()

Reply via email to