Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

Thanks for taking a detailed look.  I'll explore the links you provided shortly.

The API is designed to be extendable so that we don't get trapped by the choice 
of computation method.  If needed, any or all of the following extensions are 
possible without breaking backward compatibility:

  quantiles(data, n=4, already_sorted=True) # Skip resorting
  quantiles(data, cut_points=[0.02, 0.25, 0.50, 0.75, 0.98]) # box-and-whiskers
  quantiles(data, interp_method='nearest') # also: "low", "high", "midpoint"
  quantiles(data, inclusive=True)    # For description of a complete population

The default approach used in the PR matches what is used by MS Excel's 
PERCENTILE.EXC function¹.  That has several virtues. It is easy to explain.  It 
allows two unequal sized datasets to be compared (perhaps with a QQ plot) to 
explore whether they are drawn from the same distribution.  For sampled data, 
the quantiles tend to remain stable as more samples are added.  For samples 
from a known distribution (i.e normal variates), it tends to give the same 
results as ihv_cdf():

    >>> iq = NormalDist(100, 15)
    >>> cohort = iq.samples(10_000)
    >>> for ref, est in zip(quantiles(iq, n=10), quantiles(cohort, n=10)):
    ...     print(f'{ref:5.1f}\t{est:5.1f}')
    ...
     80.8        81.0
     87.4        87.8
     92.1        92.3
     96.2        96.3
    100.0       100.1
    103.8       104.0
    107.9       108.0
    112.6       112.9
    119.2       119.3

My thought was to start with something like this and only add options if they 
get requested (the most likely request is an inclusive=True option to emulate 
MS Excel's PERCENTILE.INC).  

If we need to leave the exact method unguaranteed, that's fine.  But I think it 
would be better to guarantee the match to PERCENTILE.EXC and then handle other 
requests through API extensions rather than revisions.


¹ https://exceljet.net/excel-functions/excel-percentile.exc-function

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to