[Python-ideas] NAN handling in statistics functions

Steven D'Aprano Mon, 23 Aug 2021 20:56:09 -0700

At the moment, the handling of NANs in the statistics module is 
implementation dependent. In practice, that *usually* means that if your 
data has a NAN in it, the result you get will probably be a NAN.


    >>> statistics.mean([1, 2, float('nan'), 4])
    nan

But there are unfortunate exceptions to this:

    >>> statistics.median([1, 2, float('nan'), 4])
    nan
    >>> statistics.median([float('nan'), 1, 2, 4])
    1.5

I've spoken to users of other statistics packages and languages, such as 
R, and I cannot find any consensus on what the "right" behaviour should 
be for NANs except "not that!".

So I propose that statistics functions gain a keyword only parameter to 
specify the desired behaviour when a NAN is found:

- raise an exception

- return NAN

- ignore it (filter out NANs)

which seem to be the three most common preference. (It seems to be 
split roughly equally between the three.)

Thoughts? Objections?

Does anyone have any strong feelings about what should be the default? 


-- 
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/EDRF2NR4UOYMSKE64KDI2SWUMKPAJ3YM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] NAN handling in statistics functions

Reply via email to