[ https://issues.apache.org/jira/browse/SPARK-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-6310. ------------------------------------ Resolution: Duplicate > ChiSqTest should check for too few counts > ----------------------------------------- > > Key: SPARK-6310 > URL: https://issues.apache.org/jira/browse/SPARK-6310 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.2.0 > Reporter: Joseph K. Bradley > > ChiSqTest assumes that elements of the contingency matrix are large enough > (have enough counts) s.t. the central limit theorem kicks in. It would be > reasonable to do one or more of the following: > * Add a note in the docs about making sure there are a reasonable number of > instances being used (or counts in the contingency table entries, to be more > precise and account for skewed category distributions). > * Add a check in the code which could: > ** Log a warning message > ** Alter the p-value to make sure it indicates the test result is > insignificant -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org