On 12/26/19 3:14 PM, David Mertz wrote:
Maybe we can just change the function signature:
statistics.median(it, do_wrong_ass_thing_with_nans=False)
:-)
But yes, the problem is really with sorted(). However, the
implementation of statistics.median() doesn't HAVE TO use sorted(),
that's just one convenient way to do it.
Yes, median could do the sort some other way, and in fact the code for
median makes a comment to investigate doing it some other way. The fact
that median doesn't actually need the full list sorted, says that
There IS NO right answer for `sorted([nan, 1, 2, 3])`. However, there
is a very plausibly right answer for `statistics.median([nan, 1, 2,
3])` ... or rather, both 'nan' and '2' are plausible (one approach is
what Numpy does, the other is what Pandas does).
Other possible answers would be 1.5 or 2.5 (if the sorting method ended
up putting NaNs at the bottom or top of the order, based on the
definition of the median as the value which half the values are greater
than (or less than) it. In one sense if there isn't *A* right answer, NO
answer is right.
As was pointed out, the statistics module specifically doesn't claim to
replace more powerful packages, like Numpy, so expecting it to handle
this level of nuance is beyond its specification.
--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/UTXEYLXLCGNUQR2XPF3QKMJUZA3UIGJT/
Code of Conduct: http://python.org/psf/codeofconduct/