Xiangrui Meng created SPARK-34080:
-------------------------------------

             Summary: Add UnivariateFeatureSelector to deprecate existing 
selectors
                 Key: SPARK-34080
                 URL: https://issues.apache.org/jira/browse/SPARK-34080
             Project: Spark
          Issue Type: New Feature
          Components: ML
    Affects Versions: 3.2.0
            Reporter: Xiangrui Meng


In SPARK-26111, we introduced a few univariate feature selectors, which share a 
common set of params. And they are named after the underlying test, which 
requires users to understand the test to find the matched scenarios. It would 
be nice if we introduce a single class called UnivariateFeatureSelector that 
accepts a selection criterion and a score method (string names). Then we can 
deprecate all other univariate selectors.

For the params, instead of ask users to provide what score function to use, it 
is more friendly to ask users to specify the feature and label types 
(continuous or categorical) and we set a default score function for each combo. 
We can also detect the types from feature metadata if given. Advanced users can 
overwrite it (if there are multiple score function that is compatible with the 
feature type and label type combo). Example (param names are not finalized):

{code}
selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], 
labelCol=["target"], featureType="categorical", labelType="continuous")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to