[ https://issues.apache.org/jira/browse/SPARK-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-8884: ------------------------------------- Target Version/s: 2.1.0 (was: 2.0.0) > 1-sample Anderson-Darling Goodness-of-Fit test > ---------------------------------------------- > > Key: SPARK-8884 > URL: https://issues.apache.org/jira/browse/SPARK-8884 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Jose Cambronero > > We have implemented a 1-sample Anderson-Darling goodness-of-fit test to add > to the current hypothesis testing functionality. The current implementation > supports various distributions (normal, exponential, gumbel, logistic, and > weibull). However, users must provide distribution parameters for all except > normal/exponential (in which case they are estimated from the data). In > contrast to other tests, such as the Kolmogorov Smirnov test, we only support > specific distributions as the critical values depend on the distribution > being tested. > The distributed implementation of AD takes advantage of the fact that we can > calculate a portion of the statistic within each partition of a sorted data > set, independent of the global order of those observations. We can then carry > some additional information that allows us to adjust the final amounts once > we have collected 1 result per partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org