[Python-ideas] Re: Fix statistics.median()?

David Mertz Mon, 30 Dec 2019 08:59:37 -0800

On Mon, Dec 30, 2019 at 3:32 AM Andrew Barnert via Python-ideas <
[email protected]> wrote:


> On Dec 29, 2019, at 23:50, Steven D'Aprano <[email protected]> wrote:
> >
> > On Sun, Dec 29, 2019 at 06:23:03PM -0800, Andrew Barnert via
> Python-ideas wrote:
> >
> >> Likewise, it’s even easier to write ignore-nan yourself than to write
> the DSU yourself:
> >>
> >>    median = statistics.median(x for x in xs if not x.isnan())
> >
> > Try that with xs = [1, 10**400, 2] and come back to me.
>
> Presumably the end user (unlike the statistics module) knows what data
> they have.


No, Steven is right here.  In Python we might very sensibly mix numeric
datatypes.  But this means we need an `is_nan()` function like some
discussed in these threads, not rely on a method (and not the same behavior
as math.isnan()).

E.g.:

my_data = {'observation1': 10**400,  # really big amount
           'observation2': 1, # ordinary size
           'observation3': 2.0, # ordinary size
           'observation4': math.nan  # missing data }

median = statistics.median_high(x for x in my_data if not is_nan(x))

The answer '2.0' is plainly right here, and there's no reason we shouldn't
provide it.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/3FU3DMZ4NYSG4GVJCMQNFCRH6OOWYHP2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to