[ 
https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263006#comment-17263006
 ] 

Xiangrui Meng commented on SPARK-34080:
---------------------------------------

Not sure if we have time for 3.1.1 release. But if there are other release 
blockers, it would be great if we can make the changes in without deprecating 
the APIs later.

> Add UnivariateFeatureSelector to deprecate existing selectors
> -------------------------------------------------------------
>
>                 Key: SPARK-34080
>                 URL: https://issues.apache.org/jira/browse/SPARK-34080
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 3.2.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>
> In SPARK-26111, we introduced a few univariate feature selectors, which share 
> a common set of params. And they are named after the underlying test, which 
> requires users to understand the test to find the matched scenarios. It would 
> be nice if we introduce a single class called UnivariateFeatureSelector that 
> accepts a selection criterion and a score method (string names). Then we can 
> deprecate all other univariate selectors.
> For the params, instead of ask users to provide what score function to use, 
> it is more friendly to ask users to specify the feature and label types 
> (continuous or categorical) and we set a default score function for each 
> combo. We can also detect the types from feature metadata if given. Advanced 
> users can overwrite it (if there are multiple score function that is 
> compatible with the feature type and label type combo). Example (param names 
> are not finalized):
> {code}
> selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], 
> labelCol=["target"], featureType="categorical", labelType="continuous", 
> select="bestK", k=100)
> {code}
> cc: [~huaxingao] [~ruifengz] [~weichenxu123]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to