[jira] [Commented] (SPARK-6162) Handle missing values in GBM

Joseph K. Bradley (JIRA) Sun, 06 Mar 2016 17:25:16 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182434#comment-15182434
 ]


Joseph K. Bradley commented on SPARK-6162:
------------------------------------------

I agree this will be nice to add someday, but it's less pressing than other 
tasks for now.

> Handle missing values in GBM
> ----------------------------
>
>                 Key: SPARK-6162
>                 URL: https://issues.apache.org/jira/browse/SPARK-6162
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.1
>            Reporter: Devesh Parekh
>
> We build a lot of predictive models over data combined from multiple sources, 
> where some entries may not have all sources of data and so some values are 
> missing in each feature vector. Another place this might come up is if you 
> have features from slightly heterogeneous items (or items composed of 
> heterogeneous subcomponents) that share many features in common but may have 
> extra features for different types, and you don't want to manually train 
> models for every different type.
> R's GBM library, which is what we are currently using, deals with this type 
> of data nicely by making "missing" nodes in the decision tree (a surrogate 
> split) for features that can have missing values. We'd like to do the same 
> with MLLib, but LabeledPoint would need to support missing values, and 
> GradientBoostedTrees would need to be modified to deal with them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6162) Handle missing values in GBM

Reply via email to