[ 
https://issues.apache.org/jira/browse/STATISTICS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447265#comment-17447265
 ] 

Alex Herbert commented on STATISTICS-47:
----------------------------------------

I have added support for the inverse SF to the master branch.

Some bugs in the inverse CDF were found for the discrete distributions that use 
an explicit inversion of the forward CDF function. This was the Geometric and 
Uniform distributions. They had appropriate rounding at the upper bound for 
cdf( x ) but not the lower bound for cdf( x ).

Tests have been added for the discrete distributions to ensure the inversion is 
correct at the boundary change. The following conditions are now tested using 
single ULP changes in the probability value passed to the inverse function 
where appropriate:
{noformat}
     icdf( cdf(x) )                 = x
     icdf( p > cdf(x) )            >= x+1
     icdf( cdf(x-1) < p < cdf(x) )  = x

     isf( sf(x) )                   = x
     isf( p < sf(x) )              >= x+1
     isf( sf(x-1) > p > sf(x) )     = x
{noformat}
The distributions are now limited by the precision of the function that is 
being inverted. Thus it is possible that the following are not true:
{noformat}
     icdf( 1 - sf(x) )      = x
     isf( 1 - cdf(x) )      = x
{noformat}
An example of such behaviour is demonstrated by the discrete uniform 
distribution:
{code:java}
UniformDiscreteDistribution dist = UniformDiscreteDistribution.of(0, 10);

double p = dist.cumulativeProbability(8);
double q = dist.survivalProbability(8);
int xp = dist.inverseCumulativeProbability(p);
int xq = dist.inverseSurvivalProbability(q);
int xx = dist.inverseSurvivalProbability(1 - p);

System.out.printf("%s + %s = %s  (q ~ %s)%n", p, q, p + q, 1 - p);
System.out.printf("icdf( cdf(%d) )    = %d%n", x, xp);
System.out.printf("isf( sf(%d) )      = %d%n", x, xq);
System.out.printf("isf( 1 - cdf(%d) ) = %d%n", x, xx);

BigDecimal bp = BigDecimal.valueOf(8).divide(BigDecimal.valueOf(11), 
MathContext.DECIMAL128);
BigDecimal bq = BigDecimal.valueOf(2).divide(BigDecimal.valueOf(11), 
MathContext.DECIMAL128);
System.out.printf("p error %s : q error = %s%n",
    bp.subtract(new BigDecimal(p)).doubleValue(),
    bq.subtract(new BigDecimal(q)).doubleValue());
{code}
{noformat}
0.8181818181818182 + 0.18181818181818182 = 1.0  (q ~ 0.18181818181818177)
icdf( cdf(8) )    = 8
isf( sf(8) )      = 8
isf( 1 - cdf(8) ) = 9
p error -0.09090909090909095 : q error = -5.046468293750712E-18
{noformat}
Inversion of the complement of the CDF does not return the same value. This is 
because the survival function has a greater accuracy than the CDF and inverts 
the complement of the CDF as being x+1, if the 1 - cdf( x ) is interpreted as a 
survival probability.

This behaviour may be confusing but is a result of the lack of precision of the 
probability value as it approaches 1. It is always better to use the CDF and 
inverse CDF if interested in cumulative probabilities and the SF and inverse SF 
if interested in survival probabilities.

 

> Add an inverse for the SurvivalProbability
> ------------------------------------------
>
>                 Key: STATISTICS-47
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-47
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Major
>
> The distributions currently have an inverse for the cumulative probability 
> but not the complement (the survival probability). For example for the 
> ContinuousDistribution interface:
> {code:java}
> double cumulativeProbability(double x);
> double survivalProbability(double x);
> double inverseCumulativeProbability(double p); {code}
> Add:
> {code:java}
> double inverseSurvivalProbability(double p);{code}
> It should be possible to update the implementation in the abstract base class 
> for the distributions to support using either the CDF or SF for the search 
> allowing both to be implemented with the same algorithm.
> This would be of benefit for distributions which support a high precision 
> survival function, e.g.
> {code:java}
> final ContinuousDistribution dist = NormalDistribution.of(0, 1);
> double x = 10;
> double p = dist.survivalProbability(x);
> System.out.printf("x = %s%np = sf(x) = %s%n%n icdf(1-p) = %s%n%n-icdf(p) = 
> %s%n",
>     x, p, dist.inverseCumulativeProbability(1 - p), 
> -dist.inverseCumulativeProbability(p));
> {code}
> Prints:
> {noformat}
> x = 10.0
> p = sf(x) = 7.619853024160595E-24
>  icdf(1-p) = Infinity
> -icdf(p) = 10.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to