Hello,
while working on MATH-711, I think I stumbled upon some
inconsistencies in the implementation of the Pascal distribution. In
fact, these might well be a bug, but since I'm a bit rusty on
probabilities, I would be grateful for some feedback.
Here is the thing. The header of the Javadoc states that the random
variable (say X) being represented by this class is the number of
*failures*. First of all, I'm not questioning this convention, but I
think the current Javadoc is slightly confusing, because it refers to
the Wikipedia website, where the opposite convention is adopted (X is
the number of *successes*). This would not matter too much, but the
Javadoc would require some alterations -- I think.
More disturbing: the names of the parameters of the distribution are
not consistent with the wikipedia page being referred to:
  - p is the probability of success in both cases: OK,
  - r is the number of failures in Wikipedia, but the number of
successes in CM... I would suggest that we at least rename this
parameter s or something. Except if it is generally accepted by the
"Pascal community" that the first parameter should always be called r,
no matter the convention. In which case I would recommend that the
references to Wikipedia be removed.

Finally, the bug. According to my hand-calcs, I think that with the
notations of CM, the mean of X is given by
mean(X) = r * (1 - p) / p,
while the currently implemented formula is
r * p / (1 - p),
which is actually the formula corresponding to the Wikipedia convention.

Here is a very naive piece of code supporting my calcs
{code}
public class PascalDistributionDemo {
    public static void main(String[] args) {
        final int r = 10;
        final double p = 0.2;
        final int numTerms = 1000;
        final PascalDistribution distribution = new PascalDistribution(r, p);
        double mean = 0.;
        for (int k = numTerms - 1; k >= 0; k--) {
            mean += k * distribution.probability(k);
        }
        // The following prints 40.00000000000012
        System.out.println("Estimate of the mean = " + mean);
        // The following prints 2.5
        System.out.println("CM implementation = " +
                           distribution.getNumericalMean());
        // The following prints 2.5
        System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
        // The following prints 40.0
        System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
    }
}
{code}

Again, I'm not an expert in this area, so I might be very, very wrong,
and I apologize beforehand if that's the case. If I'm right, though, I
think that
1. the Javadoc needs to be seriously rewritten, and the convention
adopted *must* be stated very clearly,
2. the implementation needs thorough verification.

I'm happy to do that and open a ticket on this issue once you give me the go...

Best,
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to