[ 
https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-34080.
--------------------------------
    Fix Version/s: 3.2.0
                   3.1.1
       Resolution: Fixed

Issue resolved by pull request 31160
[https://github.com/apache/spark/pull/31160]

> Add UnivariateFeatureSelector to deprecate existing selectors
> -------------------------------------------------------------
>
>                 Key: SPARK-34080
>                 URL: https://issues.apache.org/jira/browse/SPARK-34080
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 3.2.0, 3.1.1
>            Reporter: Xiangrui Meng
>            Assignee: Huaxin Gao
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>
> In SPARK-26111, we introduced a few univariate feature selectors, which share 
> a common set of params. And they are named after the underlying test, which 
> requires users to understand the test to find the matched scenarios. It would 
> be nice if we introduce a single class called UnivariateFeatureSelector that 
> accepts a selection criterion and a score method (string names). Then we can 
> deprecate all other univariate selectors.
> For the params, instead of ask users to provide what score function to use, 
> it is more friendly to ask users to specify the feature and label types 
> (continuous or categorical) and we set a default score function for each 
> combo. We can also detect the types from feature metadata if given. Advanced 
> users can overwrite it (if there are multiple score function that is 
> compatible with the feature type and label type combo). Example (param names 
> are not finalized):
> {code}
> selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], 
> labelCol=["target"], featureType="categorical", labelType="continuous", 
> select="bestK", k=100)
> {code}
> cc: [~huaxingao] [~ruifengz] [~weichenxu123]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to