On 2021-08-23 20:53, Steven D'Aprano wrote:
So I propose that statistics functions gain a keyword only parameter to
specify the desired behaviour when a NAN is found:

- raise an exception

- return NAN

- ignore it (filter out NANs)

which seem to be the three most common preference. (It seems to be
split roughly equally between the three.)

Thoughts? Objections?

I agree that these are the three options that should be available because they're the most commonly used ones in other tools that handle NANs (like numpy and pandas).

Does anyone have any strong feelings about what should be the default?

I'm conflicted. The NAN-aware tool I use most is Pandas, which for the most part handles nans by filtering them out, and this is very handy. But that's partly because Pandas has a lot of NAN-awareness built in (making it easy to, for instance, fill in NANs with some default or imputed value).

I think I'd lean toward "return NAN" as the best default, as it seems most consistent with how NAN works in ordinary mathematical expressions (e.g., `2 + nan`).

One important thing we should think about is whether to add similar handling to `max` and `min`. These are builtin functions, not in the statistics module, but they have similarly confusing behavior with NAN: compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`. As long as we're handling this for median and so on, it would be nice to have the ability to do NAN-aware max and min as well.

--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail."
   --author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZSPIO2YAVUXPZM7W7OHQDHZITQ4ZNO2H/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to