On 12/30/19 12:45 PM, David Mertz wrote:
On Mon, Dec 30, 2019 at 12:37 PM Richard Damon <rich...@damon-family.org <mailto:rich...@damon-family.org>> wrote:

    My preference is that the interpretation that NaN means Missing Data
isn't appropriate for for the statistics module.

You need to tel the entire PyData ecosystem, the entire R ecosystem, and a pretty much all of Data Science that they are wrong then.  I would generally prefer a different sentinel value as well, but you are saying to refuse to interoperate with hundreds of millions of lines of code that do not meet the rule you have now declared.

I suppose purity beats practicality though.

First, for R and other languages where arrays of data are single typed, NaN is a sort of reasonable (or at least a least wrong) value. That is the environment where the convention stated. There Practicality beats trying to be pure, and once you decide you need a No Data value, NaN is better than -99999. (one of the other historical choices)

In the domain of advanced statistical packages, that derive from that history, I can accept that usage, when used by people who understand its implications, and use packages adapted to Python from that domain. The statistics package does NOT come from that history, and explicitly refers people to package that are for those usages.

I would note that if median ignored NaNs, then so should things like mean, and stdev which don't, but return nans. This would be an argument that the 'poison' option  maybe should be the default option for median if a nan policy is added.

--

Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P3CIUBRFRW6CBZYNLY772A7OSZTX24ND/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to