On Fri, Dec 27, 2019 at 04:32:44AM -0000, Marco Sulla via Python-ideas wrote:
> Think about this: you have a population of 1 million of people. You > want to take the median of their heart rate. But for some reason, your > calculations gives you some NaN. The only reasonable scenario for that is if NANs indicate missing data. Otherwise, a NAN is an obvious measurement error, like a negative heart rate, or infinity. > If you remove the NaNs, it's like you remove people from your > statistics. If the value is missing, then you don't know what value it has. It's as if you never collected the data in the first place! (That's because you *didn't* collect the data -- if you did, it wouldn't be missing.) How do you deal with data you don't have? Obviously you don't include measurements you don't have in your data. > And since the median is the central value of the > population, you're faking the result. That's not faking the result, it is the only reasonable way to handle data that is missing and unrecoverable. For example, SPSS simply omits missing data from its calculations: https://stats.idre.ucla.edu/spss/modules/missing-data/ More here: http://www.real-statistics.com/descriptive-statistics/missing-data/ Dealing with missing data is not really something that functions like mean, median etc can deal with other than to ignore it, raise an exception or return a NAN. The statistician collecting the data may *sometimes* be in a good position to do something about missing data: for example, chasing up respondents and convincing them to provide the missing values, or interpolating missing values from the available values ("data imputation"). https://measuringu.com/handle-missing-data/ Any sort of data imputation is, to my mind, suspicious: you are effectively making up data and hoping it is representative. But whether data imputation is justified or not, it is *far* out of scope for the statistics module. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/F7VK5BFQJ3DKWX2ANIDBLM2PE4ULZF4W/ Code of Conduct: http://python.org/psf/codeofconduct/