[
https://issues.apache.org/jira/browse/STATISTICS-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856445#comment-17856445
]
Alex Herbert commented on STATISTICS-85:
Added int[] quantile in commit:
2ef1d41becd3e92683a79f6455be35d414496d7d
> Quantile implementation
> ---
>
> Key: STATISTICS-85
> URL: https://issues.apache.org/jira/browse/STATISTICS-85
> Project: Commons Statistics
> Issue Type: New Feature
> Components: descriptive
>Reporter: Alex Herbert
>Priority: Major
> Fix For: 1.1
>
> Attachments: 100QuantilesRandomDataLength1.png,
> MedianRandomData.png
>
>
> Add a quantile implementation. This will interpolate the value of a sorted
> array of data for probability p in [0, 1].
> Replace the legacy API from Commons Math Percentile with an updated API. The
> new API should:
> * Decouple estimation of quantile positions inside data of length n; and the
> selection of correctly-ordered indices in array data.
> * Support multiple data types.
> * Support pre-sorted data.
> * Avoid performance issues observed in the CM Percentile implementation.
> h2. Proposed API
> {code:java}
> org.apache.commons.statistics.descriptive
> public final class Quantile {
> // overwrite=true; EstimationMethod.HF8; NaNPolicy.ERROR
> public static Quantile withDefaults();
> public Quantile withOverwrite(boolean);
> public Quantile with(EstimationMethod);
> // Could support NaN handling ... see below for NaNPolicy
> public Quantile with(NaNPolicy);
> // Create n uniform probabilities in range [p1, p2]
> public static double[] probabilities(int n);
> public static double[] probabilities(int n, double p1, double p2);
> // Quantiles on sorted data a of size n
> public double evaluate(int n, java.util.function.IntToDoubleFunction a,
> double p);
> public double[] evaluate(int n, java.util.function.IntToDoubleFunction a,
> double... p);
> // Quantiles on the primitive types that cannot be easily sorted
> public double evaluate(double[] a, double p);
> public double[] evaluate(double[] a, double... p);
> public double evaluate(int[] a, double p);
> public double[] evaluate(int[] a, double... p);
> public double evaluate(long[] a, double p);
> public double[] evaluate(long[] a, double... p);
> public double evaluate(float[] a, double p);
> public double[] evaluate(float[] a, double... p);
> // Provide the 9 methods in Hyndman and Fan (1996)
> // Sample Quantiles in Statistical Packages.
> // The American Statistician, 50, 361-365.
> public abstract class Quantile$EstimationMethod extends
> java.lang.Enum {
> public static final Quantile$EstimationMethod HF1;
> public static final Quantile$EstimationMethod HF2;
> public static final Quantile$EstimationMethod HF3;
> public static final Quantile$EstimationMethod HF4;
> public static final Quantile$EstimationMethod HF5;
> public static final Quantile$EstimationMethod HF6;
> public static final Quantile$EstimationMethod HF7;
> public static final Quantile$EstimationMethod HF8;
> public static final Quantile$EstimationMethod HF9;
> }
> }
> Note: The CM API used the 9 methods from Hyndman and Fann but labelled them
> as R1-9; this may be derived from the same names used in the R language. I
> propose to rename as HF1-9 to reflect the origin.
> {code}
> h2. NaNPolicy
> There are multiple options here. For reference R and Python's numpy only
> provide the option to exclude NaN:
> * R: quantile errors if NaN is present. median returns NaN. They is an
> option to exclude NaN.
> * numpy: two methods are provided: median/nanmedian + quantile/nanquantile
> (the non-nan versions will return NaN if any NaNs are present)
> Commons Math provides a remapping. Note the Statistics ranking module has the
> same NaNStrategy as that in CM:
> * MINIMAL: map to -infinity
> * MAXIMAL: map to +infinity
> * REMOVED: ignore from the data
> * FIXED: leave in place. This makes no sense for quantiles. It is done by
> moving to the end following the order imposed by Double.compare.
> * FAILED: raise an exception
> I favour the simpler option of: treating NaN so they are above/below all
> other values; removing them from the data; or raising an exception. I do not
> see the requirement to remap NaN to infinity. This can be done by the user.
> The API can be simplified by using:
> {code:java}
> public final class NaNPolicy extends java.lang.Enum {
> public static final NaNPolicy LAST;// Move to end of data
> public static final NaNPolicy FIRST; // Move to start of data
> public static final NaNPolicy REMOVE; // Remove from data
> public static final NaNPolicy ERR