[jira] [Commented] (SPARK-15041) adding mode strategy for ml.feature.Imputer for categorical features

2024-06-13 Thread Chhavi Bansal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854661#comment-17854661
 ] 

Chhavi Bansal commented on SPARK-15041:
---

Are there plans to have imputations for Categorical string type columns? What 
is the recommended way to handle such scenarios? 

> adding mode strategy for ml.feature.Imputer for categorical features
> 
>
> Key: SPARK-15041
> URL: https://issues.apache.org/jira/browse/SPARK-15041
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>  Labels: bulk-closed
>
> Adding mode strategy for ml.feature.Imputer for categorical features. This 
> need to wait until PR for SPARK-13568 gets merged.
> https://github.com/apache/spark/pull/11601
> From comments of jkbradley and Nick Pentreath in the PR
> {quote}
> Investigate efficiency of approaches using DataFrame/Dataset and/or approx 
> approaches such as frequentItems or Count-Min Sketch (will require an update 
> to CMS to return "heavy-hitters").
> investigate if we can use metadata to only allow mode for categorical 
> features (or perhaps as an easier alternative, allow mode for only Int/Long 
> columns)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15041) adding mode strategy for ml.feature.Imputer for categorical features

2018-09-15 Thread Manu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616243#comment-16616243
 ] 

Manu Zhang commented on SPARK-15041:


Is there a plan to add such strategies as min/max ?

> adding mode strategy for ml.feature.Imputer for categorical features
> 
>
> Key: SPARK-15041
> URL: https://issues.apache.org/jira/browse/SPARK-15041
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>
> Adding mode strategy for ml.feature.Imputer for categorical features. This 
> need to wait until PR for SPARK-13568 gets merged.
> https://github.com/apache/spark/pull/11601
> From comments of jkbradley and Nick Pentreath in the PR
> {quote}
> Investigate efficiency of approaches using DataFrame/Dataset and/or approx 
> approaches such as frequentItems or Count-Min Sketch (will require an update 
> to CMS to return "heavy-hitters").
> investigate if we can use metadata to only allow mode for categorical 
> features (or perhaps as an easier alternative, allow mode for only Int/Long 
> columns)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15041) adding mode strategy for ml.feature.Imputer for categorical features

2016-04-30 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265411#comment-15265411
 ] 

Gayathri Murali commented on SPARK-15041:
-

I can work on this

> adding mode strategy for ml.feature.Imputer for categorical features
> 
>
> Key: SPARK-15041
> URL: https://issues.apache.org/jira/browse/SPARK-15041
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>
> Adding mode strategy for ml.feature.Imputer for categorical features. This 
> need to wait until PR for SPARK-13568 gets merged.
> https://github.com/apache/spark/pull/11601
> From comments of jkbradley and Nick Pentreath in the PR
> {quote}
> Investigate efficiency of approaches using DataFrame/Dataset and/or approx 
> approaches such as frequentItems or Count-Min Sketch (will require an update 
> to CMS to return "heavy-hitters").
> investigate if we can use metadata to only allow mode for categorical 
> features (or perhaps as an easier alternative, allow mode for only Int/Long 
> columns)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org