[Python-ideas] Re: Fix statistics.median()?

Richard Damon Mon, 30 Dec 2019 14:06:37 -0800

On 12/30/19 4:22 PM, Andrew Barnert via Python-ideas wrote:

On Dec 30, 2019, at 06:50, Richard Damon <rich...@damon-family.org> wrote:

On 12/30/19 12:06 AM, Andrew Barnert wrote:

On Dec 29, 2019, at 20:04, Richard Damon <rich...@damon-family.org> wrote:
Thus your total_order, while not REALLY a total order, is likely good enough 
for most purposes.

Well, it is a total order of equivalence classes (with all IEEE-equal values 
being equivalent, all negative NaNs being equivalent, and all positive NaNs 
being equivalent).


But whether it’s good enough for most purposes depends on what those purposes 
are.

I’m having a hard time imagining any purposes where “some well-defined but 
arbitrary, unnatural, and idiosyncratic total order of equivalence classes of 
the binary64 values” is of any use at all. Maybe that’s just a failure of my 
imagination. But “the specific well-defined but arbitrary, unnatural order that 
almost every other similar bit of code in every language is likely to use” 
seems a lot more likely to be useful.

(And, needless to say, so does “this specific order that I defined myself”. I 
don’t know why the OP wanted to treat all NaNs as equivalents values greater 
than inf, but presumably he does know, and that’s fine. So, I don’t think that 
needs to be built into the module as one of the options, but I don’t see why he 
shouldn’t be able to specify it explicitly.)

But it DOESN'T make all positive NaNs equivalent, as they do not compare 'equal'. 
Neither is less than the other, but they don't compare equal. If you use a sort 
function (not the one that Python currently uses) that uses an equality test as part 
of its sort (rather than just letting the falsehood of a < b and b< a imply 
that a == b) then it will break and perhaps return illogical results. This is getting 
into the weeds of corner cases, so may not be important.

Yes it does. If all positive NaNs sort the same with this key function and the 
Python sorted function, they form an equivalence class under that function. The 
fact that they wouldn’t form an equivalence class under a different function is 
irrelevant. It’s like arguing that 1 and 8 can’t be in the same equivalence 
class for mod 7 addition because they’re not equivalent under mod 11 addition.

And I don’t even know what you’re trying to argue here. That we shouldn’t allow 
this key function because it wouldn’t provide a total order if someone 
monkeypatched builtins to replace sorted? And therefore we should invent some 
“better” order that nobody has asked for and offer that?

The issue is that the functions generates keys of the form (1, nan) forNaNs, and two of these do not compare equal, so not an equivalencyclass. IF the sort routine only uses <, and assumes that if neither a <b or b < a then they are equal, but not all sort routine use thatmethod. I think the current sorted function just uses <, so it works asa key, it just isn't guaranteed to work for all sort algorithms.Basically, if a sort ever compares with a Nan using normal comparisonrules, you have opened the door to problems. In this case we will onlyever compare a NaN to another NaN, and depending on the sort routine, itstill might work, and if the algorithm never uses an equality test tosee that they aren't equal (and few sorts actually test for equality) itwon't cause a problem.

One simple solution is to just make the nan tuples be (1, 0) or (-1, 0)and just make all NaNs a real equivalence class. If you want NaN to sortby their representation, then you could could make that the second termof the tuple (as an int)

I will admit that if you want to sort a list that contains a mix of floats, 
Decimals, and Rationals, then the tuple approach might be better then the 
conversion to a 64 bit int as the key, since that doesn't work for the other 
types except by converting them to float first.

Better for what?

If there’s no good use for the OP’s order or the tuple order or IEEE total 
order, we shouldn’t try to provide any of them; figuring out which one “makes 
sense” or coming up with one that “makes more sense” when we don’t even know 
what case we’re trying to make sense of is pointless. Inventing a new order 
that nobody asked for and nobody can even imagine a use for doesn’t help 
anyone, it’s just more code to write and maintain and one more option to 
confuse users with that benefits nobody. It doesn’t matter what properties that 
new order has.

If there is a good use for any of them, on the other hand… inventing a 
different order that isn’t one any of them asked for still doesn’t help anyone. 
We need to provide what they actually asked for, or give them a way to do so 
(most obviously, with a key parameter), or just reject their use as unimportant.

Building the 64 bit int key to sort floats by IEEE total_order onlyworks if all the values are floats, as that key won't compare properlywith other types of numbers. If your data does have other types ofnumbers, than making the tuple to force NaNs into their right place, andlet the normal value comparison work for normal numbers (or infinitieswhich I think get compared right).

As to why NaNs are greater than infinity, that comes from the IEEE 
specification on the total_order relationship of floating point numbers.

So what? If you care about IEEE total order, you want IEEE total order, not an 
unspecified idiosyncratic order that’s similar to it in some ways.

The question was why were NaNs sorted this way, and the basic answer isto get a usable order for sorting, you have to put them somewhere, andthe IEEE total order definition is as good as any. Ultimately, noposition would make sense universally. The order that IEEE chose is 1)a simple order to create (being based fairly simply on therepresentation for binary floating point), and 2) putting NaNs at theextreme may make it easier to remove them if that is what is desired.

The order he generates is very close to the IEEE total order, thedifference are:

1) It doesn't seperate -0 for +0, which shouldn't matter for mostapplications.

2) It doesn't provide an order between NaNs, but unless you are takingspecial measures to distinguish the NaNs anyway, that doesn't really matter.


--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NWLJBR2ZO4ZJ3FBMKSIN6PHSZXPIV2DF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to