Marc-Andre Lemburg added the comment: On 12.06.2013 07:32, Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > It looks like we a approaching consensus on some points: > > 1. Mixed script numerals should be disallowed. > 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}' > > Open question: should we accept fullwidth + and -, sub/superscript variants > etc.? I believe rather than debating variant codepoints one by one, we > should consider applying NFKC (compatibility) normalization to unicode > strings to be interpreted as numbers. This would allow parsing strings like > this: > >>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL >>>> STOP}\N{FULLWIDTH DIGIT TWO}')) > -1.2
While it would solve these cases, I think that would cause a significant performance hit. Perhaps we could do this in two phases: 1. detect whether the string uses non-ASCII digits and symbols 2. if it does, apply normalization and then use the decimal codec ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10581> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com