On 11/28/11 12:18 AM, Sébastien Brisard wrote: > Hello, > while working on MATH-711, I think I stumbled upon some > inconsistencies in the implementation of the Pascal distribution. In > fact, these might well be a bug, but since I'm a bit rusty on > probabilities, I would be grateful for some feedback. > Here is the thing. The header of the Javadoc states that the random > variable (say X) being represented by this class is the number of > *failures*. First of all, I'm not questioning this convention, but I > think the current Javadoc is slightly confusing, because it refers to > the Wikipedia website, where the opposite convention is adopted (X is > the number of *successes*). This would not matter too much, but the > Javadoc would require some alterations -- I think. > More disturbing: the names of the parameters of the distribution are > not consistent with the wikipedia page being referred to: > - p is the probability of success in both cases: OK, > - r is the number of failures in Wikipedia, but the number of > successes in CM... I would suggest that we at least rename this > parameter s or something. Except if it is generally accepted by the > "Pascal community" that the first parameter should always be called r, > no matter the convention. In which case I would recommend that the > references to Wikipedia be removed. > > Finally, the bug. According to my hand-calcs, I think that with the > notations of CM, the mean of X is given by > mean(X) = r * (1 - p) / p, > while the currently implemented formula is > r * p / (1 - p), > which is actually the formula corresponding to the Wikipedia convention. > > Here is a very naive piece of code supporting my calcs > {code} > public class PascalDistributionDemo { > public static void main(String[] args) { > final int r = 10; > final double p = 0.2; > final int numTerms = 1000; > final PascalDistribution distribution = new PascalDistribution(r, p); > double mean = 0.; > for (int k = numTerms - 1; k >= 0; k--) { > mean += k * distribution.probability(k); > } > // The following prints 40.00000000000012 > System.out.println("Estimate of the mean = " + mean); > // The following prints 2.5 > System.out.println("CM implementation = " + > distribution.getNumericalMean()); > // The following prints 2.5 > System.out.println("r * p / (1 - p) = " + (r * p / (1 - p))); > // The following prints 40.0 > System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p)); > } > } > {code} > > Again, I'm not an expert in this area, so I might be very, very wrong, > and I apologize beforehand if that's the case. If I'm right, though, I > think that > 1. the Javadoc needs to be seriously rewritten, and the convention > adopted *must* be stated very clearly, > 2. the implementation needs thorough verification.
I looked a little more carefully at this. I think the mean formula is incorrect, so this is a bug. Regarding the convention used, it is clearly stated in the class javadoc, but you are right it should be made more clear exactly what is being computed. I am -0 for changing the implementation, because that would force current users to switch success/failure. It would be best, I think to add full formulas to the javadoc. I don't think it is necessary to use the same letters as the references do. Phil > > I'm happy to do that and open a ticket on this issue once you give me the > go... > > Best, > Sébastien > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org