[ https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weichen Xu resolved SPARK-34080. -------------------------------- Fix Version/s: 3.2.0 3.1.1 Resolution: Fixed Issue resolved by pull request 31160 [https://github.com/apache/spark/pull/31160] > Add UnivariateFeatureSelector to deprecate existing selectors > ------------------------------------------------------------- > > Key: SPARK-34080 > URL: https://issues.apache.org/jira/browse/SPARK-34080 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 3.2.0, 3.1.1 > Reporter: Xiangrui Meng > Assignee: Huaxin Gao > Priority: Critical > Fix For: 3.1.1, 3.2.0 > > > In SPARK-26111, we introduced a few univariate feature selectors, which share > a common set of params. And they are named after the underlying test, which > requires users to understand the test to find the matched scenarios. It would > be nice if we introduce a single class called UnivariateFeatureSelector that > accepts a selection criterion and a score method (string names). Then we can > deprecate all other univariate selectors. > For the params, instead of ask users to provide what score function to use, > it is more friendly to ask users to specify the feature and label types > (continuous or categorical) and we set a default score function for each > combo. We can also detect the types from feature metadata if given. Advanced > users can overwrite it (if there are multiple score function that is > compatible with the feature type and label type combo). Example (param names > are not finalized): > {code} > selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], > labelCol=["target"], featureType="categorical", labelType="continuous", > select="bestK", k=100) > {code} > cc: [~huaxingao] [~ruifengz] [~weichenxu123] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org