Alexander Belopolsky added the comment:
As a design principle, "accept what's unambiguous in any locale" is reasonable,
but it is hard to apply consistently. I would agree that the status quo is
hard to defend. After a long discussion, it has been accepted that fullwidth
digits should be accepted and now float(u'123') is valid, but not
float('+123'), float('-123') or float('12⒊'). The last example is
>>> '\N{FULLWIDTH DIGIT ONE}\N{FULLWIDTH DIGIT TWO}\N{DIGIT THREE FULL STOP}'
'12⒊'
All these variations can be neatly addressed by applying NFKC or NFKD
normalization to unicode data before conversion:
>>> float(unicodedata.normalize('NFKD', '+123'))
123.0
>>> float(unicodedata.normalize('NFKD', '-123'))
-123.0
>>> float(unicodedata.normalize('NFKC', '12⒊'))
123.0
This would even allow parsing fullwidth hexadecimal numbers:
>>> float.fromhex(unicodedata.normalize('NFKC', '0x⒈7p3'))
11.5
>>> int(unicodedata.normalize('NFKC', '7F'), 16)
127
but would not help with the MINUS SIGN.
Allowing '\N{MINUS SIGN}' is particularly attractive because arguably unicode
text should prefer it to ambiguous '\N{HYPHEN-MINUS}', but on the same token
fractions.Fraction() should accept '\N{FRACTION SLASH}' in addition to the
legacy '\N{SOLIDUS}'.
Overall, I think this situation calls for a PEP-size proposal and discussion
about handling unicode numerical data throughout stdlib rather that a case by
case discussion of the various quirks in the curent version.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue6632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com