Hi all,
there is a PR to merge very limited support for weights in quantiles,
which given no further input I will probably merge based on sklearn
devs saying that they will use it. This means, adding a `weights`
kwarg [1]. See:
https://github.com/numpy/numpy/pull/24254
Limited here means that it would only work for the "inverted_cdf"
method (which is not the default one).
Why is it very limited? Because this limited version is the only form
we/I am pretty confident about getting it right.
There are various problems with making it more broad:
1. Weights are not clearly defined and can have many meanings, e.g.:
* frequency weights (repeated observations)
* probability weights (removing sample biases)
* "analytic"/"precision" weights (encoding observation
precision/variance).
2. There is very little to no literature on how to deal with the
subtleties of dealing with (in the context of the various types
of weights:
* Interpolation (relevant to all interpolating methods)
* Unbiasing (the main difference between the methods)
The PR adds the most minimal thing, where weights are largly equivalent
(no unbiasing issues, no interpolation). [2]
Due to these complexities (and the lack of many statistic specialists
looking at it) there is a point to be made that we just shouldn't add
this in NumPy, but if nobody else has an opinion, I will go with the
sklearn devs who want it :).
(Also with weights we have to rely on full sorting for now, which can
be slow, which I can live with personally.)
- Sebastian
[1] There are different styles of weights and for some method that
clearly matters. Thus, if we ever expand the definition, it may be
that `weights` has to be mapped to one of these, or that the the
generic `weights` kwarg would raise an error for these that you need to
pick a specific one like `pweights=`, or `fweights=`.
[2] I am not quite sure about "analytic weights" here, but to me these
do not really make sense in the context of a discrete interpolation
method.
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]