[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Mon, 30 Dec 2019 14:21:44 -0800

> On Dec 30, 2019, at 08:55, David Mertz <me...@gnosis.cx> wrote:
>> Presumably the end user (unlike the statistics module) knows what data they 
>> have.
> 
> No, Steven is right here.  In Python we might very sensibly mix numeric 
> datatypes.


The statistics module explicitly doesn’t support doing so. Which means anyone 
who’s doing it anyway is into “experienced user” territory, and ought to know 
what they’re doing.

At any rate, I wasn’t arguing that we don’t need a NaN test function in 
statistics. My point—lost by snipping off all the context—was nearly the 
opposite. The fact that you can NaN-filter things yourself (more easily than 
the statistics module can) doesn’t mean the module shouldn’t offer an ignore 
option—and therefore, the fact that you can DSU things yourself (less easily 
than using a key function) doesn’t mean the module shouldn’t offer a key 
parameter.

(There may be other good arguments against a key parameter. The fact that all 
three of the alternate orders anyone’s asked for or suggested turned out to be 
spurious, and nobody can think of a good use for a different one, that’s a 
pretty good argument that YAGNI. But that doesn’t make the bogus argument from 
“theoretically you could do it yourself so we don’t need to offer it no matter 
how useful” any less bogus.)

> But this means we need an `is_nan()` function like some discussed in these 
> threads, not rely on a method (and not the same behavior as math.isnan()).

Wait, what’s wrong with the behavior of math.isnan for floats? If you want a 
NaN test that differs from the one defined by IEEE, I think we’re off into 
uncharted waters.

Let’s get concrete: say we have a function that tries the method, and, on 
exception, tries math for floats, returns false for other Numbers, and finally 
raises a TypeError if all of the above failed. (If this were a general thing 
rather than a statistics thing, add trying cmath too.)

What values of what types does that not serve? People keep trying to come up 
with “better” NaN tests than the obvious one, but better for what? If you don’t 
have an actual problem to solve, what use is a solution, no matter how clever?

> E.g.:
> 
> my_data = {'observation1': 10**400,  # really big amount
>            'observation2': 1, # ordinary size
>            'observation3': 2.0, # ordinary size
>            'observation4': math.nan  # missing data }
> 
> median = statistics.median_high(x for x in my_data if not is_nan(x))
> 
> The answer '2.0' is plainly right here, and there's no reason we shouldn't 
> provide it.

Wait, are you arguing that we should just offer a generic is_nan function (as a 
builtin?), instead of adding an on_nan handler parameter to median and friends?

If so, apologies; I guess I was disagreeing with someone else’s very different 
position above, not yours.

This helps users who are sophisticated enough to intentionally use NaNs for 
missing data, and to know they want to filter them out of a median, and to know 
how to do that with a genexpr, and to know when you can and can’t safely ignore 
the docs on which inputs are supported by statistics, but not sophisticated 
enough to write an isnan test for their mix of two types. But do any such users 
exist?

Writing a NaN test that works for your values even though you intentionally 
mixed two types isn’t the hard part. It’s knowing what to do with that NaN test.
Which still isn’t all that hard, but it’s something a lot of novices haven’t 
learned yet. I think there are a lot more users of the statistics module who 
would be helped by raise and ignore options on median than by just giving them 
the simple tools to build that behavior themselves and hoping they figure out 
that they need to.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SDGMTW6LUQBWGB6JYTPKDVQP4D6IVULX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to