Luc <ouaga...@gmail.com> added the comment:

Just to make sure we are focused on the issue, the reported bug is with the 
statistics library (not with numpy). It happens, when there is at least one 
missing value in the data and involves the computation of the median, 
median_low and median_high using the statistics library.
The test was performed on Python 3.6.4.

When there is no missing values (NaNs) in the data, computing the median, 
median_high and median_low from the statistics library work fine.
So, yes, removing the NaNs (or imputing for them) before computing the 
median(s) resolve the issue.
Also, just like statistics.mean(data) when data has missing return a nan, the 
median, median_high and median_low should  behave the same way.

import numpy
import statistics as stats

data = [75, 90,85, 92, 95, 80, np.nan]

Median = stats.median(data) 
Median_high = stats.median_high(data)
Median_low = stats.median_low(data)
print("The incorrect Median is", Median)
The incorrect Median is, 90
print("The incorrect median high is", Median_high)
The incorrect median high is, 90
print("The incorrect median low is", Median_low)
The incorrect median low is, 90

## Mean returns nan
Mean = stats.mean(data)
prin("The mean is", Mean)
The mean is, nan

Now, when we drop the missing values, we have:
data2 = [75, 90,85, 92, 95, 80]
stats.median(data2)
87.5
stats.median_high(data2)
90
stats.median_low(data2)
85

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to