On 2021-08-23 20:53, Steven D'Aprano wrote:
So I propose that statistics functions gain a keyword only parameter to
specify the desired behaviour when a NAN is found:
- raise an exception
- return NAN
- ignore it (filter out NANs)
which seem to be the three most common preference. (It seems to be
split roughly equally between the three.)
Thoughts? Objections?
I agree that these are the three options that should be available
because they're the most commonly used ones in other tools that handle
NANs (like numpy and pandas).
Does anyone have any strong feelings about what should be the default?
I'm conflicted. The NAN-aware tool I use most is Pandas, which for the
most part handles nans by filtering them out, and this is very handy.
But that's partly because Pandas has a lot of NAN-awareness built in
(making it easy to, for instance, fill in NANs with some default or
imputed value).
I think I'd lean toward "return NAN" as the best default, as it seems
most consistent with how NAN works in ordinary mathematical expressions
(e.g., `2 + nan`).
One important thing we should think about is whether to add similar
handling to `max` and `min`. These are builtin functions, not in the
statistics module, but they have similarly confusing behavior with NAN:
compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`. As
long as we're handling this for median and so on, it would be nice to
have the ability to do NAN-aware max and min as well.
--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no
path, and leave a trail."
--author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/ZSPIO2YAVUXPZM7W7OHQDHZITQ4ZNO2H/
Code of Conduct: http://python.org/psf/codeofconduct/