[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989195#comment-13989195 ] Phil Steitz commented on MATH-437: -- Updated user guide and TestUtils in r1592430. All that remains now is removing the deprecated classes in 4.0 > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 4.0 > > Attachments: MATH437-with-test-take-1, ks-distribution.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913713#comment-13913713 ] Phil Steitz commented on MATH-437: -- First cut added in r1572335. I don't want to hold up 3.3 release for this, but pre- or post-3.3 still need to: 1. Update user guide 2. Add static methods to TestUtils > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 4.0 > > Attachments: MATH437-with-test-take-1, ks-distribution.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907103#comment-13907103 ] Luc Maisonobe commented on MATH-437: Sure, we can wait for this issue to be fixed before releasing 3.3 if you think it can be done soon. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 4.0 > > Attachments: MATH437-with-test-take-1, ks-distribution.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907012#comment-13907012 ] Phil Steitz commented on MATH-437: -- I have the first part of this - the new class and deprecation of the old - just about ready to commit. I would like to slide this in 3.3 if I can have to the end of this week to finish it. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 4.0 > > Attachments: MATH437-with-test-take-1, ks-distribution.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615110#comment-13615110 ] Luc Maisonobe commented on MATH-437: +1 to postpone to 4.0. I have no opinion about the usefulness of the distribution itself, so if you think it would be worth removing this part and having only the test, this would be fine with me. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 3.2 > > Attachments: ks-distribution.patch, MATH437-with-test-take-1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614601#comment-13614601 ] Phil Steitz commented on MATH-437: -- I think we should bump this to 4.0 or at least 3.3. It was probably a mistake to put K-S in the distribution package. The K-S distribution itself is of little practical usefulness (to my knowledge at least). I have never seen it used for anything but performing K-S tests. It is tricky enough to compute the distribution function itself with any kind of numerical stability, as the comments above and the literature around K-S tests confirm. Computing moments is, as the reference where Luc (resourcefully!) found test data states, "intractable." I think it may be best to steer clear of this and focus on just getting good implementation of the test itself, which should move to .inference. I would prefer to do a little more research though to decide how best to set up the API and implementation for the test. It could be we would be better off not using the cdfs in the current impl, instead using beta approximation to compute p-values as in [1]. Note also that since discussion above / initial implementation, Simard has published [2] with some empirical findings on how the various K-S approximation methods perform. So to summarize, I think the first step is to agree on the K-S test API. Then deprecate the class in .distribution and move the test class to .inference. [1]http://www.ism.ac.jp/editsec/aism/pdf/054_3_0577.pdf [2] http://www.jstatsoft.org/v39/i11/paper > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Phil Steitz >Priority: Minor > Fix For: 3.2 > > Attachments: ks-distribution.patch, MATH437-with-test-take-1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604413#comment-13604413 ] Phil Steitz commented on MATH-437: -- Pushing this to top of my [math] list. The patch makes great use of our numerics :) but I have to think there are better ways to compute these things. Also, we need to separate the test class, which should go into .stat.inference from the distribution class. If this is the last bug holding up 3.2, feel free to bump to 4.0 or 3.3. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Priority: Minor > Fix For: 3.2 > > Attachments: ks-distribution.patch, MATH437-with-test-take-1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008936#comment-13008936 ] Phil Steitz commented on MATH-437: -- +1 to commit as is, adding some algorithm notes to the class javadoc and the MATH-435 power impl. I am ambivalent on whether or not to "fix" the error in Marsaglia's code that is apparently included in R. Having the verification tests is good, though, so I would leave as is in the patch, since the Marsaglia C impl can be seen as a reference in this case. I can see the other side of the argument here, though and would be fine with just going with the fixed code, suitably documented. What do others think about this? It looks like you forgot to add the references to the class javadoc for the impl class. Per comment on MATH-435, I think we should add the matrix power impl there and use it here. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Fix For: 3.0 > > Attachments: MATH437-with-test-take-1 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975771#action_12975771 ] Mikkel Meyer Andersen commented on MATH-437: In the past months, I've communicated with both Richard Simard and George Marsaglia regarding small disagreement between theory in Marsaglia's article and the actual implementation; namely the fact that 0 <= h < 1, but in the code 0 < h <= 1. I wrote to Marsaglia regarding this, and his answer was: {quote} The Kolmogorov distribution comes from a piecewise polynomial in h with knots at 1/2n, 2/2n,...,(2n-1)/2n, with each segment assumed to start with h=0. Although I emphasized that 0<= h <1 in the article, I overlooked the need for ensuring that in the C code, and apparently so did my colleagues. Sorry about that. {quote} This means that his code has to be changed slightly to ensure that 0 <= h < 1. Simard argues that this shouldn't mean anything because KS distribution is continuous, but if one wants to correct it, one should {quote} Instead of taking the floor(n*d + 1) and making this correction for h = 1, take the ceiling (n*d). {quote} I would prefer using ceiling (n*d) instead of the originally (wrongly) proposed floor(n*d + 1), despite arguments of continuity. So my plan is to do this (I still have my implementation which seem to work quite okay). The only problem is that R seems to use Marsaglia's code, and I don't have access to e.g. Mathematica which should implement several algorithms, so I might run into problems when I have to perform tests. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Fix For: 3.0 > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933166#action_12933166 ] Richard Simard commented on MATH-437: - If you used the same x in all 3 cases, I believe there is a bug in your exact and not exact codes because you get only 2 decimal digits of precision. > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Attachments: KolmogorovSmirnovDistribution.java > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933161#action_12933161 ] Mikkel Meyer Andersen commented on MATH-437: Richard, Thanks for your knowledgeable comment. To quote you: {quote}"The argument x that you used in the Simard-L'écuyer program is not the same that you used for the other two programs."{quote} I'm not sure what you mean by that? Which argument should I then use to expect the same result? > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Attachments: KolmogorovSmirnovDistribution.java > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933077#action_12933077 ] Richard Simard commented on MATH-437: - http://www.mail-archive.com/issues@commons.apache.org/msg15829.html F(n, x) = F(200, 0.03): Lecuyer (2.0 ms.) = 0.012916146481628863 KolmogorovSmirnovDistribution exact (51902.0 ms.) = 0.012149763742041911 KolmogorovSmirnovDistribution !exact (9.0 ms.) = 0.012149763742041922 The argument x that you used in the Simard-L'écuyer program is not the same that you used for the other two programs. Of course you then get very different results. If I compute exactly in Mathematica, I obtain F(200, 0.03) = 0.0129161464816289 which is very different than your exact results above and agrees well with our program. = Richard Simard Laboratoire de simulation et d'optimisation Université de Montréal, IRO > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Attachments: KolmogorovSmirnovDistribution.java > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MATH-437) Kolmogorov Smirnov Distribution
[ https://issues.apache.org/jira/browse/MATH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932361#action_12932361 ] Mikkel Meyer Andersen commented on MATH-437: The last part of the roundedK kan be replaced with {{ double pFrac = Hpower.getEntry(k - 2, k - 2); for (int i = 1; i <= n; ++i) { pFrac *= (double)i / (double)n; } return pFrac; }} to get even better running time and still precise results: {{ F(n, x) = F(200, 0.02): Lecuyer (3.0 ms.) = 5.151982014280042E-6 KolmogorovSmirnovDistribution exact (760.0 ms.) = 5.15198201428005E-6 KolmogorovSmirnovDistribution !exact (16.0 ms.) = 5.151982014280049E-6 - F(n, x) = F(200, 0.03): Lecuyer (2.0 ms.) = 0.012916146481628863 KolmogorovSmirnovDistribution exact (51902.0 ms.) = 0.012149763742041911 KolmogorovSmirnovDistribution !exact (9.0 ms.) = 0.012149763742041922 - F(n, x) = F(200, 0.04): Lecuyer (0.0 ms.) = 0.1067121882956352 KolmogorovSmirnovDistribution exact (5903.0 ms.) = 0.10671370113626812 KolmogorovSmirnovDistribution !exact (6.0 ms.) = 0.10671370113626813 - }} > Kolmogorov Smirnov Distribution > --- > > Key: MATH-437 > URL: https://issues.apache.org/jira/browse/MATH-437 > Project: Commons Math > Issue Type: New Feature >Reporter: Mikkel Meyer Andersen >Assignee: Mikkel Meyer Andersen >Priority: Minor > Attachments: KolmogorovSmirnovDistribution.java > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > Kolmogorov-Smirnov test (see [1]) is used to test if one sample against a > known probability density functions or if two samples are from the same > distribution. To evaluate the test statistic, the Kolmogorov-Smirnov > distribution is used. Quite good asymptotics exist for the one-sided test, > but it's more difficult for the two-sided test. > [1]: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.