On 12/26/19 1:38 PM, Marco Sulla via Python-ideas wrote:
Well, some days ago i didn't know about `statistics` module, so I
wrote my own median implementation, that I improved with the help of a
private discussion:

```
import math

def median(it, member=False, sort_fn=sorted, **kwargs):
     if sort is None:
         # Don't sort. Coder must be carefull to pass an already sorted iterable
         sorted_it = it
     else:
         sorted_it = sort_fn(it, **kwargs)

     try:
         len_it = len(it)
     except TypeError:
         # Generator, iterator et similia
         it = tuple(it)
         len_it = len(it)

     if len_it == 0:
         raise ValueError("Iterable is empty")

     index = len_it // 2

     if isEven(len_it):
         res1 = it[index]
         res2 = it[index-1]

         if math.isnan(res1):
             return res2

         if math.isnan(res2):
             return res1

         if member:
             # To remove bias
             if isEven(index):
                 return min(res1, res2)
             else:
                 return max(res1, res2)

         else:
             res = (it[index] + it[index-1]) / 2
     else:
         res = it[index]

     return res


def isEven(num):
     return num % 2 == 0

```

As you can see, with `sort_fn` you can pass another function, maybe
the pandas one (even if I do not recommend it, pandas is slow). Or you
can pass None and sort the iterable before. Maybe you have already
sorted the iterable, so there's no reason to sort it again.

Furthermore, if the iterable have even length and the elements are not
numbers, you can calculate the median in a predictable way choosing
member=True. It will return one of the two central arguments, in a
not-biased way. So you don't need median_high() or median_low() in
this cases.

Finally, if the iterable have even length and one of the two central
values is NaN, the other value is returned. The function returns NaN
only if both are NaNs.
Note, this functions still has issues with NaN values, unless you change to use a sort function different than sorted, as sorted doesn't check if items are strictly comparable, but just assumes that < provides a total order, which NaN values breaks. sorted basically can generate an unsorted result if there are items like NaN that break the assumption.

--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FKHRFJSMEJ24YYHFGFLNKBW4SGXWCU6U/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to