[ https://issues.apache.org/jira/browse/SPARK-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang updated SPARK-17645: -------------------------------- Target Version/s: 2.2.0 > Add feature selector methods based on: False Discovery Rate (FDR) and Family > Wise Error rate (FWE) > -------------------------------------------------------------------------------------------------- > > Key: SPARK-17645 > URL: https://issues.apache.org/jira/browse/SPARK-17645 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib > Reporter: Peng Meng > Assignee: Peng Meng > Priority: Minor > Original Estimate: 48h > Remaining Estimate: 48h > > Univariate feature selection works by selecting the best features based on > univariate statistical tests. > FDR and FWE are a popular univariate statistical test for feature selection. > In 2005, the Benjamini and Hochberg paper on FDR was identified as one of the > 25 most-cited statistical papers. The FDR uses the Benjamini-Hochberg > procedure in this PR. https://en.wikipedia.org/wiki/False_discovery_rate. > In statistics, FWE is the probability of making one or more false > discoveries, or type I errors, among all the hypotheses when performing > multiple hypotheses tests. > https://en.wikipedia.org/wiki/Family-wise_error_rate > We add FDR and FWE methods for ChiSqSelector in this PR, like it is > implemented in scikit-learn. > http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org