On 11/28/11 12:18 AM, Sébastien Brisard wrote:
> Hello,
> while working on MATH-711, I think I stumbled upon some
> inconsistencies in the implementation of the Pascal distribution. In
> fact, these might well be a bug, but since I'm a bit rusty on
> probabilities, I would be grateful for some feedback.
> Here is the thing. The header of the Javadoc states that the random
> variable (say X) being represented by this class is the number of
> *failures*. First of all, I'm not questioning this convention, but I
> think the current Javadoc is slightly confusing, because it refers to
> the Wikipedia website, where the opposite convention is adopted (X is
> the number of *successes*). This would not matter too much, but the
> Javadoc would require some alterations -- I think.
> More disturbing: the names of the parameters of the distribution are
> not consistent with the wikipedia page being referred to:
>   - p is the probability of success in both cases: OK,
>   - r is the number of failures in Wikipedia, but the number of
> successes in CM... I would suggest that we at least rename this
> parameter s or something. Except if it is generally accepted by the
> "Pascal community" that the first parameter should always be called r,
> no matter the convention. In which case I would recommend that the
> references to Wikipedia be removed.
>
> Finally, the bug. According to my hand-calcs, I think that with the
> notations of CM, the mean of X is given by
> mean(X) = r * (1 - p) / p,
> while the currently implemented formula is
> r * p / (1 - p),
> which is actually the formula corresponding to the Wikipedia convention.
>
> Here is a very naive piece of code supporting my calcs
> {code}
> public class PascalDistributionDemo {
>     public static void main(String[] args) {
>         final int r = 10;
>         final double p = 0.2;
>         final int numTerms = 1000;
>         final PascalDistribution distribution = new PascalDistribution(r, p);
>         double mean = 0.;
>         for (int k = numTerms - 1; k >= 0; k--) {
>             mean += k * distribution.probability(k);
>         }
>         // The following prints 40.00000000000012
>         System.out.println("Estimate of the mean = " + mean);
>         // The following prints 2.5
>         System.out.println("CM implementation = " +
>                            distribution.getNumericalMean());
>         // The following prints 2.5
>         System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
>         // The following prints 40.0
>         System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
>     }
> }
> {code}
>
> Again, I'm not an expert in this area, so I might be very, very wrong,
> and I apologize beforehand if that's the case. If I'm right, though, I
> think that
> 1. the Javadoc needs to be seriously rewritten, and the convention
> adopted *must* be stated very clearly,
> 2. the implementation needs thorough verification.

I looked a little more carefully at this.  I think the mean formula
is incorrect, so this is a bug.  Regarding the convention used, it
is clearly stated in the class javadoc, but you are right it should
be made more clear exactly what is being computed.  I am -0 for
changing the implementation, because that would force current users
to switch success/failure.  It would be best, I think to add full
formulas to the javadoc.  I don't think it is necessary to use the
same letters as the references do.

Phil
>
> I'm happy to do that and open a ticket on this issue once you give me the 
> go...
>
> Best,
> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to