Marc-Andre Lemburg added the comment:

On 12.06.2013 07:32, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> It looks like we a approaching consensus on some points:
> 
> 1. Mixed script numerals should be disallowed.
> 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}'
> 
> Open question: should we accept fullwidth + and -, sub/superscript variants 
> etc.?  I believe rather than debating variant codepoints one by one, we 
> should consider applying NFKC (compatibility) normalization to unicode 
> strings to be interpreted as numbers.  This would allow parsing strings like 
> this:
> 
>>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL 
>>>> STOP}\N{FULLWIDTH DIGIT TWO}'))
> -1.2

While it would solve these cases, I think that would cause a
significant performance hit.

Perhaps we could do this in two phases:
1. detect whether the string uses non-ASCII digits and symbols
2. if it does, apply normalization and then use the decimal codec

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10581>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to