Re: [Python-Dev] Python and the Unicode Character Database

Eric Smith Thu, 02 Dec 2010 14:09:17 -0800

On 12/2/2010 4:48 PM, "Martin v. Löwis" wrote:

Am 02.12.2010 22:30, schrieb Steven D'Aprano:

Martin v. Löwis wrote:

Then these users should speak up and indicate their need, or somebody
should speak up and confirm that there are users who actually want
'١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
system in which '١٢٣٤.٥٦e4' means 12345600.0.

I'm not sure what you're after here.


That the current float() constructor accepts tons of bogus character
strings and accepts them as numbers, and that it should stop doing so.


What bogus characters do the float() and int() constructors accept? As
far as I can see, they only accepts numerals.


Not bogus characters, but bogus character strings. E.g. strings that mix
digits from different scripts, and mix them with the Python decimal
separator.

Notice that Python does *not* currently support printing numbers in
other scripts - even though this may actually be more useful than
parsing.


Lack of one function, even if more useful, does not imply that an
existing function should be removed.


No. But if the specific function(ality) is not useful and
underspecified, it should be removed.

So your problems with the current behaviour are:

(1) in some unspecified way, it's not done correctly;


No. My main concern is that it is not properly specified. If it was
specified, I could then tell you what precisely is wrong about it.
Right now, I can only give examples for input that it should not accept,
and examples of input that it should, but does not accept.

(2) it belongs somewhere other than float() and int().


That's only because it also needs a parameter to specify what syntax to
follow, somehow. That parameter could be explicit or implicit, and it
could be to float or to some other function. But it must be available,
and is not.

That second is awfully close to bike-shedding. Since you accept that
Python *should* have the current behaviour


No, I don't. I think it behaves incorrectly, accepting garbage input and
guessing some meaning out of it.

- how the current behaviour is incorrect;


See above: it accepts strings that do not denote real numbers in any
writing system, and, despite the claim that the feature is there to
support other writing systems, actually does not truly support other
writing systems.

- your suggestions for correcting it; and


Make the current implementation exactly match the current documentation.
I think the documentation is correct; the implementation is wrong.

- a concrete suggestion for where you would like to see the behaviour
moved to, and why that would be better than where it currently is.


The current behavior should go nowhere; it is not useful. Something very
similar to the current behavior (but done correctly) should go into the
locale module.

I agree with everything Martin says here. I think the basic premise is:you won't find strings "in the wild" that use non-ASCII digits but douse the ASCII dot as a decimal point. And that's what float() is lookingfor. (And that doesn't even begin to address what it expects for anexponent 'e'.)


Eric.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

Reply via email to