[ https://issues.apache.org/jira/browse/STATISTICS-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388891#comment-17388891 ]
Alex Herbert commented on STATISTICS-32: ---------------------------------------- I have attempted to implement this for the discrete distributions using R as the reference for high precision result values. See [PR 28|https://github.com/apache/commons-statistics/pull/28]. Two distributions use the RegularizedBeta function to compute the CDF and survival function. - BinomialDistribution - PascalDistribution (i.e. a negative binomial) There is an identity that can be used here: {noformat} 1 - I_z(a, b) = I_{1-z}(b, a) {noformat} In both distributions the z value is the probability of success. Thus if you compute 1 - p as p approaches zero the 1 - p value is not exact. For very small p (less than 2^-53) the value 1 - p will be 1. Thus for these computations I have used 1 - p only when p >= 0.5 and thus 1 - p is exact. The aim is to maintain p to the closest value input by the user. This however may not compute the most accurate value for the probability. See the example for the Pascal distribution: {code:java} @Override public double survivalProbability(int x) { double ret; if (x < 0) { ret = 1.0; } else if (probabilityOfSuccess >= 0.5) { // 1 - p is exact. // Use the identity of the regularized beta function: 1 - I_z(a, b) = I_{1-z}(b, a) ret = RegularizedBeta.value(1.0 - probabilityOfSuccess, x + 1.0, numberOfSuccesses); } else { ret = 1.0 - RegularizedBeta.value(probabilityOfSuccess, numberOfSuccesses, x + 1.0); } return ret; } {code} Depending on the parameters p and x either computation may be more accurate. The internals of RegularizedBeta.value actually detect and use this identity: {code:java} public static double value(double x, final double a, final double b, double epsilon, int maxIterations) { if (...) { return Double.NaN; } else if (x > (a + 1) / (2 + b + a) && 1 - x <= (b + 1) / (2 + b + a)) { return 1 - value(1 - x, b, a, epsilon, maxIterations); } else { // compute ... } } {code} I will investigate using logic to call the RegularizedBeta with the most appropriate arguments to avoid it hitting the condition where it computes 1 - value. The unit tests I have already added for high precision should detect if the function is being correctly used. > Add survival probability function to discrete distributions > ----------------------------------------------------------- > > Key: STATISTICS-32 > URL: https://issues.apache.org/jira/browse/STATISTICS-32 > Project: Apache Commons Statistics > Issue Type: New Feature > Reporter: Benjamin W Trent > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Sibling issue to: STATISTICS-31 > It is useful to know the [survival > function|https://en.wikipedia.org/wiki/Survival_function] of a number given a > discrete distribution. > While this can be approximated with > {noformat} > 1 - cdf(x){noformat} > , there is an opportunity for greater accuracy in certain distributions. -- This message was sent by Atlassian Jira (v8.3.4#803005)