On 12/26/19 3:14 PM, David Mertz wrote:
Maybe we can just change the function signature:

statistics.median(it, do_wrong_ass_thing_with_nans=False)

:-)

But yes, the problem is really with sorted(). However, the implementation of statistics.median() doesn't HAVE TO use sorted(), that's just one convenient way to do it.
Yes, median could do the sort some other way, and in fact the code for median makes a comment to investigate doing it some other way. The fact that median doesn't actually need the full list sorted, says that

There IS NO right answer for `sorted([nan, 1, 2, 3])`. However, there is a very plausibly right answer for `statistics.median([nan, 1, 2, 3])` ... or rather, both 'nan' and '2' are plausible (one approach is what Numpy does, the other is what Pandas does).

Other possible answers would be 1.5 or 2.5 (if the sorting method ended up putting NaNs at the bottom or top of the order, based on the definition of the median as the value which half the values are greater than (or less than) it. In one sense if there isn't *A* right answer, NO answer is right.

As was pointed out, the statistics module specifically doesn't claim to replace more powerful packages, like Numpy, so expecting it to handle this level of nuance is beyond its specification.

--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UTXEYLXLCGNUQR2XPF3QKMJUZA3UIGJT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to