[ https://issues.apache.org/jira/browse/SPARK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307629#comment-15307629 ]
Jieyuan Chen commented on SPARK-15656: -------------------------------------- Thanks for the answer.I make a mistake that I think the parameter passed in is the original random variable values like `kolmogorovSmirnovTest`, actually it should be frequencies. > ChiSqTest for goodness of fit doesn't test against a wrong uniform > distribution by default > ------------------------------------------------------------------------------------------ > > Key: SPARK-15656 > URL: https://issues.apache.org/jira/browse/SPARK-15656 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.5.1, 1.6.1 > Reporter: Jieyuan Chen > Labels: easyfix, mllib, stats > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > I've been running a ChiSqTest to test whether my samples fit a uniform > distribution. > The documentation says that If a second vector to test against is not > supplied as a parameter, the test runs against a uniform distribution. But > when I pass samples drawn from a normal distribution, the p-value calculated > is 1.0, which is wrong. > The problem is that in ChiSqTest.scala, the `chiSquared` function will > generate a wrong uniform distribution if the expected vector is not supplied. > The default generated samples should be > val expArr = if (expected.size == 0) Array.tabulate(size)(i => i.toDouble / > size) else expected.toArray -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org