On Fri, Dec 27, 2019 at 04:32:44AM -0000, Marco Sulla via Python-ideas wrote:

> Think about this: you have a population of 1 million of people. You 
> want to take the median of their heart rate. But for some reason, your 
> calculations gives you some NaN.

The only reasonable scenario for that is if NANs indicate missing data. 
Otherwise, a NAN is an obvious measurement error, like a negative heart 
rate, or infinity.


> If you remove the NaNs, it's like you remove people from your 
> statistics.

If the value is missing, then you don't know what value it has. It's as 
if you never collected the data in the first place! (That's because you 
*didn't* collect the data -- if you did, it wouldn't be missing.)

How do you deal with data you don't have? Obviously you don't 
include measurements you don't have in your data.


> And since the median is the central value of the 
> population, you're faking the result.

That's not faking the result, it is the only reasonable way to handle 
data that is missing and unrecoverable. For example, SPSS simply omits 
missing data from its calculations:

https://stats.idre.ucla.edu/spss/modules/missing-data/

More here:

http://www.real-statistics.com/descriptive-statistics/missing-data/

Dealing with missing data is not really something that functions like 
mean, median etc can deal with other than to ignore it, raise an 
exception or return a NAN.

The statistician collecting the data may *sometimes* be in a good 
position to do something about missing data: for example, chasing up 
respondents and convincing them to provide the missing values, or 
interpolating missing values from the available values ("data 
imputation").

https://measuringu.com/handle-missing-data/

Any sort of data imputation is, to my mind, suspicious: you are 
effectively making up data and hoping it is representative. But whether 
data imputation is justified or not, it is *far* out of scope for the 
statistics module.



-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/F7VK5BFQJ3DKWX2ANIDBLM2PE4ULZF4W/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to