Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-12-05 Thread Sébastien Brisard
Hi Mikkel,

 No, you are more than welcome to do the patch and I'll review it :-). Thanks!

 Cheers, Mikkel.

I've committed a correction (r1210359). Could you please review it and
post your comments on the JIRA ticket (MATH-715).
Thanks a lot!
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-12-05 Thread Mikkel Meyer Andersen
2011/12/5 Sébastien Brisard sebastien.bris...@m4x.org:
 Hi Mikkel,

 No, you are more than welcome to do the patch and I'll review it :-). Thanks!

 Cheers, Mikkel.

 I've committed a correction (r1210359). Could you please review it and
 post your comments on the JIRA ticket (MATH-715).
 Thanks a lot!
 Sébastien

Dear Sébastien,

It seems like only the test was changed in r1210359 (svn diff -r
1210359). This does _not_ correspond to the log (svn log -r 1210359).
It seems like the changes was done in r1210358. Is that true? svn log
is empty for r1210358. This is not to bash, but merely to try to
stress the importance of keeping a stringent svn history.

Cheers, Mikkel.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-12-05 Thread Sébastien Brisard
Hi Mikkel,

 This is not to bash, but merely to try to
 stress the importance of keeping a stringent svn history.

 Cheers, Mikkel.

Thanks for pointing that out. I have run into numerous problems with
SVN this morning, and, although I'm sure I've committed with a log
(it's actually still recorded in eclipse), I would not be surprised if
this log had gone. I will correct that immediately. Thanks again,
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-12-05 Thread Sébastien Brisard
Hi Mikkel,
 It seems like only the test was changed in r1210359 (svn diff -r
 1210359). This does _not_ correspond to the log (svn log -r 1210359).

Here is an excerpt of the email sent automatically when I committed
revision 1210359

==BEGIN EXCERPT==
Author: celestin
Date: Mon Dec  5 08:15:38 2011
New Revision: 1210359

URL: http://svn.apache.org/viewvc?rev=1210359view=rev
Log:
- Corrected expressions for mean and variance in
distribution.PascalDistribution (MATH-715).
- Made javadoc more explicit
- Restored SVN properties to various files in package distribution.



Modified: 
commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
URL: 
http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java?rev=1210359r1=1210358r2=1210359view=diff
==
--- 
commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
(original)
+++ 
commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
Mon Dec  5 08:15:38 2011
@@ -31,10 +31,25 @@ import org.apache.commons.math.util.Fast
 * /p
 * p
 * There are various ways to express the probability mass and distribution
- * functions for the Pascal distribution.  The convention employed by the
- * library is to express these functions in terms of the number of failures in
- * a Bernoulli experiment
- * (see a 
href=http://en.wikipedia.org/wiki/Negative_binomial_distribution#Waiting_time_in_a_Bernoulli_process;Waiting
Time in a Bernoulli Process/a).
+ * functions for the Pascal distribution. The present implementation represents
+ * the distribution of the number of failures before {@code r} successes occur.
+ * This is the convention adopted in e.g.
+ * a 
href=http://mathworld.wolfram.com/NegativeBinomialDistribution.html;MathWorld/a,
+ * but emnot/em in
+ * a 
href=http://en.wikipedia.org/wiki/Negative_binomial_distribution;Wikipedia/a.
+ * /p
+ * p
+ * For a random variable {@code X} whose values are distributed
according to this
+ * distribution, the probability mass function is given bybr/
+ * {@code P(X = k) = C(k + r - 1, r - 1) * p^r * (1 - p)^k,}br/
+ * where {@code r} is the number of successes, {@code p} is the probability of
+ * success, and {@code X} is the total number of failures. {@code C(n, k)} is
+ * the binomial coefficient ({@code n} choose {@code k}). The mean and variance
+ * of {@code X} arebr/
+ * {@code E(X) = (1 - p) * r / p, var(X) = (1 - p) * r / p^2.}br/
+ * Finally, the cumulative distribution function is given bybr/
+ * {@code P(X = k) = I(p, r, k + 1)},
+ * where I is the regularized incomplete Beta function.
 * /p
 *
 * @see a href=http://en.wikipedia.org/wiki/Negative_binomial_distribution;
@@ -159,25 +174,24 @@ public class PascalDistribution extends
 * {@inheritDoc}
 *
 * For number of successes {@code r} and probability of success {@code p},
- * the mean is {@code (r * p) / (1 - p)}.
+ * the mean is {@code r * (1 - p) / p}.
 */
public double getNumericalMean() {
final double p = getProbabilityOfSuccess();
final double r = getNumberOfSuccesses();
-return (r * p) / (1 - p);
+return (r * (1 - p)) / p;
}

/**
 * {@inheritDoc}
 *
 * For number of successes {@code r} and probability of success {@code p},
- * the variance is {@code (r * p) / (1 - p)^2}.
+ * the variance is {@code r * (1 - p) / p^2}.
 */
public double getNumericalVariance() {
final double p = getProbabilityOfSuccess();
final double r = getNumberOfSuccesses();
-final double pInv = 1 - p;
-return (r * p) / (pInv * pInv);
+return r * (1 - p) / (p * p);
}

==END EXCERPT==

It seems to me that everything (?) is OK?


 It seems like the changes was done in r1210358. Is that true?

I don't think so (see extract below).

==BEGIN EXCERPT==

Author: sgoeschl
Date: Mon Dec  5 08:02:46 2011
New Revision: 1210358

URL: http://svn.apache.org/viewvc?rev=1210358view=rev
Log:
[maven-release-plugin] prepare for next development iteration

Modified:
   commons/proper/email/trunk/pom.xml

==END EXCERPT==

I don't really understand what is happening. I hope I didn't mess up
anything this morning. Please keep me posted.
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-12-05 Thread Mikkel Meyer Andersen
2011/12/5 Sébastien Brisard sebastien.bris...@m4x.org:
 Hi Mikkel,
 It seems like only the test was changed in r1210359 (svn diff -r
 1210359). This does _not_ correspond to the log (svn log -r 1210359).

 Here is an excerpt of the email sent automatically when I committed
 revision 1210359

 ==BEGIN EXCERPT==
 Author: celestin
 Date: Mon Dec  5 08:15:38 2011
 New Revision: 1210359

 URL: http://svn.apache.org/viewvc?rev=1210359view=rev
 Log:
 - Corrected expressions for mean and variance in
 distribution.PascalDistribution (MATH-715).
 - Made javadoc more explicit
 - Restored SVN properties to various files in package distribution.

 

 Modified: 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
 URL: 
 http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java?rev=1210359r1=1210358r2=1210359view=diff
 ==
 --- 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
 (original)
 +++ 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
 Mon Dec  5 08:15:38 2011
 @@ -31,10 +31,25 @@ import org.apache.commons.math.util.Fast
  * /p
  * p
  * There are various ways to express the probability mass and distribution
 - * functions for the Pascal distribution.  The convention employed by the
 - * library is to express these functions in terms of the number of failures 
 in
 - * a Bernoulli experiment
 - * (see a 
 href=http://en.wikipedia.org/wiki/Negative_binomial_distribution#Waiting_time_in_a_Bernoulli_process;Waiting
 Time in a Bernoulli Process/a).
 + * functions for the Pascal distribution. The present implementation 
 represents
 + * the distribution of the number of failures before {@code r} successes 
 occur.
 + * This is the convention adopted in e.g.
 + * a 
 href=http://mathworld.wolfram.com/NegativeBinomialDistribution.html;MathWorld/a,
 + * but emnot/em in
 + * a 
 href=http://en.wikipedia.org/wiki/Negative_binomial_distribution;Wikipedia/a.
 + * /p
 + * p
 + * For a random variable {@code X} whose values are distributed
 according to this
 + * distribution, the probability mass function is given bybr/
 + * {@code P(X = k) = C(k + r - 1, r - 1) * p^r * (1 - p)^k,}br/
 + * where {@code r} is the number of successes, {@code p} is the probability 
 of
 + * success, and {@code X} is the total number of failures. {@code C(n, k)} is
 + * the binomial coefficient ({@code n} choose {@code k}). The mean and 
 variance
 + * of {@code X} arebr/
 + * {@code E(X) = (1 - p) * r / p, var(X) = (1 - p) * r / p^2.}br/
 + * Finally, the cumulative distribution function is given bybr/
 + * {@code P(X = k) = I(p, r, k + 1)},
 + * where I is the regularized incomplete Beta function.
  * /p
  *
  * @see a href=http://en.wikipedia.org/wiki/Negative_binomial_distribution;
 @@ -159,25 +174,24 @@ public class PascalDistribution extends
     * {@inheritDoc}
     *
     * For number of successes {@code r} and probability of success {@code p},
 -     * the mean is {@code (r * p) / (1 - p)}.
 +     * the mean is {@code r * (1 - p) / p}.
     */
    public double getNumericalMean() {
        final double p = getProbabilityOfSuccess();
        final double r = getNumberOfSuccesses();
 -        return (r * p) / (1 - p);
 +        return (r * (1 - p)) / p;
    }

    /**
     * {@inheritDoc}
     *
     * For number of successes {@code r} and probability of success {@code p},
 -     * the variance is {@code (r * p) / (1 - p)^2}.
 +     * the variance is {@code r * (1 - p) / p^2}.
     */
    public double getNumericalVariance() {
        final double p = getProbabilityOfSuccess();
        final double r = getNumberOfSuccesses();
 -        final double pInv = 1 - p;
 -        return (r * p) / (pInv * pInv);
 +        return r * (1 - p) / (p * p);
    }

 ==END EXCERPT==

 It seems to me that everything (?) is OK?


 It seems like the changes was done in r1210358. Is that true?

 I don't think so (see extract below).

 ==BEGIN EXCERPT==

 Author: sgoeschl
 Date: Mon Dec  5 08:02:46 2011
 New Revision: 1210358

 URL: http://svn.apache.org/viewvc?rev=1210358view=rev
 Log:
 [maven-release-plugin] prepare for next development iteration

 Modified:
   commons/proper/email/trunk/pom.xml

 ==END EXCERPT==

 I don't really understand what is happening. I hope I didn't mess up
 anything this morning. Please keep me posted.
 Sébastien

Hi,

Now I looked it through and it looks good to me - and also the SVN
behaves fine again (it could also be mine that acted weird).

Thanks again for fixing this.

Cheers, Mikkel.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: 

Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-11-29 Thread Mikkel Meyer Andersen
2011/11/29 Sébastien Brisard sebastien.bris...@m4x.org:
 Hi Mikkel

 Thanks for discovering this! I did the implementation, and apparently
 assumed notation as the referred source. Sorry for this.

 I think there is nothing wrong in the source. Only, you adopted a
 different definition of the random variable represented by the
 implemented PascalDistribution:
 1. In wikipedia, the r-parameter is the total number of failures,
 2. In the CM implementation, the Javadoc states that r is the total
 number of successes, and the PMF is (correctly) derived accordingly.

 We should also check the variance - this might be suffering from wrong
 notation as well.

 Yes, I'll do that, too!

 Cheers, Mikkel.


 Now is not a good time to work on o.a.c.m.distribution, since
 Christian is working on MATH-703. Once this is done, I'll commit a
 patch to correct this. You will be very welcome to review the changes
 (except of course if you want to alter your baby yourself ;) ).
No, you are more than welcome to do the patch and I'll review it :-). Thanks!

Cheers, Mikkel.
 Sébastien


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-11-28 Thread Phil Steitz
On 11/28/11 12:18 AM, Sébastien Brisard wrote:
 Hello,
 while working on MATH-711, I think I stumbled upon some
 inconsistencies in the implementation of the Pascal distribution. In
 fact, these might well be a bug, but since I'm a bit rusty on
 probabilities, I would be grateful for some feedback.
 Here is the thing. The header of the Javadoc states that the random
 variable (say X) being represented by this class is the number of
 *failures*. First of all, I'm not questioning this convention, but I
 think the current Javadoc is slightly confusing, because it refers to
 the Wikipedia website, where the opposite convention is adopted (X is
 the number of *successes*). This would not matter too much, but the
 Javadoc would require some alterations -- I think.
 More disturbing: the names of the parameters of the distribution are
 not consistent with the wikipedia page being referred to:
   - p is the probability of success in both cases: OK,
   - r is the number of failures in Wikipedia, but the number of
 successes in CM... I would suggest that we at least rename this
 parameter s or something. Except if it is generally accepted by the
 Pascal community that the first parameter should always be called r,
 no matter the convention. In which case I would recommend that the
 references to Wikipedia be removed.

 Finally, the bug. According to my hand-calcs, I think that with the
 notations of CM, the mean of X is given by
 mean(X) = r * (1 - p) / p,
 while the currently implemented formula is
 r * p / (1 - p),
 which is actually the formula corresponding to the Wikipedia convention.

 Here is a very naive piece of code supporting my calcs
 {code}
 public class PascalDistributionDemo {
 public static void main(String[] args) {
 final int r = 10;
 final double p = 0.2;
 final int numTerms = 1000;
 final PascalDistribution distribution = new PascalDistribution(r, p);
 double mean = 0.;
 for (int k = numTerms - 1; k = 0; k--) {
 mean += k * distribution.probability(k);
 }
 // The following prints 40.12
 System.out.println(Estimate of the mean =  + mean);
 // The following prints 2.5
 System.out.println(CM implementation =  +
distribution.getNumericalMean());
 // The following prints 2.5
 System.out.println(r * p / (1 - p) =  + (r * p / (1 - p)));
 // The following prints 40.0
 System.out.println(r * (1 - p) / p =  + (r * (1 - p) / p));
 }
 }
 {code}

 Again, I'm not an expert in this area, so I might be very, very wrong,
 and I apologize beforehand if that's the case. If I'm right, though, I
 think that
 1. the Javadoc needs to be seriously rewritten, and the convention
 adopted *must* be stated very clearly,
 2. the implementation needs thorough verification.

 I'm happy to do that and open a ticket on this issue once you give me the 
 go...

Go ahead an open a ticket. 

Phil

 Best,
 Sébastien


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-11-28 Thread Phil Steitz
On 11/28/11 12:18 AM, Sébastien Brisard wrote:
 Hello,
 while working on MATH-711, I think I stumbled upon some
 inconsistencies in the implementation of the Pascal distribution. In
 fact, these might well be a bug, but since I'm a bit rusty on
 probabilities, I would be grateful for some feedback.
 Here is the thing. The header of the Javadoc states that the random
 variable (say X) being represented by this class is the number of
 *failures*. First of all, I'm not questioning this convention, but I
 think the current Javadoc is slightly confusing, because it refers to
 the Wikipedia website, where the opposite convention is adopted (X is
 the number of *successes*). This would not matter too much, but the
 Javadoc would require some alterations -- I think.
 More disturbing: the names of the parameters of the distribution are
 not consistent with the wikipedia page being referred to:
   - p is the probability of success in both cases: OK,
   - r is the number of failures in Wikipedia, but the number of
 successes in CM... I would suggest that we at least rename this
 parameter s or something. Except if it is generally accepted by the
 Pascal community that the first parameter should always be called r,
 no matter the convention. In which case I would recommend that the
 references to Wikipedia be removed.

 Finally, the bug. According to my hand-calcs, I think that with the
 notations of CM, the mean of X is given by
 mean(X) = r * (1 - p) / p,
 while the currently implemented formula is
 r * p / (1 - p),
 which is actually the formula corresponding to the Wikipedia convention.

 Here is a very naive piece of code supporting my calcs
 {code}
 public class PascalDistributionDemo {
 public static void main(String[] args) {
 final int r = 10;
 final double p = 0.2;
 final int numTerms = 1000;
 final PascalDistribution distribution = new PascalDistribution(r, p);
 double mean = 0.;
 for (int k = numTerms - 1; k = 0; k--) {
 mean += k * distribution.probability(k);
 }
 // The following prints 40.12
 System.out.println(Estimate of the mean =  + mean);
 // The following prints 2.5
 System.out.println(CM implementation =  +
distribution.getNumericalMean());
 // The following prints 2.5
 System.out.println(r * p / (1 - p) =  + (r * p / (1 - p)));
 // The following prints 40.0
 System.out.println(r * (1 - p) / p =  + (r * (1 - p) / p));
 }
 }
 {code}

 Again, I'm not an expert in this area, so I might be very, very wrong,
 and I apologize beforehand if that's the case. If I'm right, though, I
 think that
 1. the Javadoc needs to be seriously rewritten, and the convention
 adopted *must* be stated very clearly,
 2. the implementation needs thorough verification.

I looked a little more carefully at this.  I think the mean formula
is incorrect, so this is a bug.  Regarding the convention used, it
is clearly stated in the class javadoc, but you are right it should
be made more clear exactly what is being computed.  I am -0 for
changing the implementation, because that would force current users
to switch success/failure.  It would be best, I think to add full
formulas to the javadoc.  I don't think it is necessary to use the
same letters as the references do.

Phil

 I'm happy to do that and open a ticket on this issue once you give me the 
 go...

 Best,
 Sébastien


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-11-28 Thread Mikkel Meyer Andersen
2011/11/28 Phil Steitz phil.ste...@gmail.com:
 On 11/28/11 12:18 AM, Sébastien Brisard wrote:
 Hello,
 while working on MATH-711, I think I stumbled upon some
 inconsistencies in the implementation of the Pascal distribution. In
 fact, these might well be a bug, but since I'm a bit rusty on
 probabilities, I would be grateful for some feedback.
 Here is the thing. The header of the Javadoc states that the random
 variable (say X) being represented by this class is the number of
 *failures*. First of all, I'm not questioning this convention, but I
 think the current Javadoc is slightly confusing, because it refers to
 the Wikipedia website, where the opposite convention is adopted (X is
 the number of *successes*). This would not matter too much, but the
 Javadoc would require some alterations -- I think.
 More disturbing: the names of the parameters of the distribution are
 not consistent with the wikipedia page being referred to:
   - p is the probability of success in both cases: OK,
   - r is the number of failures in Wikipedia, but the number of
 successes in CM... I would suggest that we at least rename this
 parameter s or something. Except if it is generally accepted by the
 Pascal community that the first parameter should always be called r,
 no matter the convention. In which case I would recommend that the
 references to Wikipedia be removed.

 Finally, the bug. According to my hand-calcs, I think that with the
 notations of CM, the mean of X is given by
 mean(X) = r * (1 - p) / p,
 while the currently implemented formula is
 r * p / (1 - p),
 which is actually the formula corresponding to the Wikipedia convention.

 Here is a very naive piece of code supporting my calcs
 {code}
 public class PascalDistributionDemo {
     public static void main(String[] args) {
         final int r = 10;
         final double p = 0.2;
         final int numTerms = 1000;
         final PascalDistribution distribution = new PascalDistribution(r, p);
         double mean = 0.;
         for (int k = numTerms - 1; k = 0; k--) {
             mean += k * distribution.probability(k);
         }
         // The following prints 40.12
         System.out.println(Estimate of the mean =  + mean);
         // The following prints 2.5
         System.out.println(CM implementation =  +
                            distribution.getNumericalMean());
         // The following prints 2.5
         System.out.println(r * p / (1 - p) =  + (r * p / (1 - p)));
         // The following prints 40.0
         System.out.println(r * (1 - p) / p =  + (r * (1 - p) / p));
     }
 }
 {code}

 Again, I'm not an expert in this area, so I might be very, very wrong,
 and I apologize beforehand if that's the case. If I'm right, though, I
 think that
 1. the Javadoc needs to be seriously rewritten, and the convention
 adopted *must* be stated very clearly,
 2. the implementation needs thorough verification.

 I looked a little more carefully at this.  I think the mean formula
 is incorrect, so this is a bug.  Regarding the convention used, it
 is clearly stated in the class javadoc, but you are right it should
 be made more clear exactly what is being computed.  I am -0 for
 changing the implementation, because that would force current users
 to switch success/failure.  It would be best, I think to add full
 formulas to the javadoc.  I don't think it is necessary to use the
 same letters as the references do.

 Phil

 I'm happy to do that and open a ticket on this issue once you give me the 
 go...

 Best,
 Sébastien


Thanks for discovering this! I did the implementation, and apparently
assumed notation as the referred source. Sorry for this.

We should also check the variance - this might be suffering from wrong
notation as well.

Cheers, Mikkel.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Inconsistencies (bugs) in PascalDistribution?

2011-11-28 Thread Sébastien Brisard
Hi Mikkel

 Thanks for discovering this! I did the implementation, and apparently
 assumed notation as the referred source. Sorry for this.

I think there is nothing wrong in the source. Only, you adopted a
different definition of the random variable represented by the
implemented PascalDistribution:
1. In wikipedia, the r-parameter is the total number of failures,
2. In the CM implementation, the Javadoc states that r is the total
number of successes, and the PMF is (correctly) derived accordingly.

 We should also check the variance - this might be suffering from wrong
 notation as well.

Yes, I'll do that, too!

 Cheers, Mikkel.


Now is not a good time to work on o.a.c.m.distribution, since
Christian is working on MATH-703. Once this is done, I'll commit a
patch to correct this. You will be very welcome to review the changes
(except of course if you want to alter your baby yourself ;) ).
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org