[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-04-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229700#comment-15229700
 ] 

Apache Spark commented on SPARK-13783:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/12230

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-04-05 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226993#comment-15226993
 ] 

Joseph K. Bradley commented on SPARK-13783:
---

Great, thanks!

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-04-04 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225525#comment-15225525
 ] 

Yanbo Liang commented on SPARK-13783:
-

[~josephkb] I will work on this.

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-04-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224954#comment-15224954
 ] 

Joseph K. Bradley commented on SPARK-13783:
---

Would someone like to take this now?

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-28 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214817#comment-15214817
 ] 

Joseph K. Bradley commented on SPARK-13783:
---

[~GayathriMurali] Sounds good!  Please ping both [~yanboliang] and me when you 
send your PR.

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-27 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213359#comment-15213359
 ] 

Yanbo Liang commented on SPARK-13783:
-

[~GayathriMurali] Please go first, I will help to review your code. After it 
get merged, I will start my PR. Thanks!

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-25 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212442#comment-15212442
 ] 

Gayathri Murali commented on SPARK-13783:
-

Thanks [~josephkb]. I can go first, as I am almost done making changes. I could 
definitely review [~yanboliang]'s code and would really appreciate the same 
help. 

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-25 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212428#comment-15212428
 ] 

Joseph K. Bradley commented on SPARK-13783:
---

I'd prefer what [~GayathriMurali] mentioned; that's what is done in 
spark.mllib.  That should be more efficient (taking more advantage of columnar 
storage).

I do want us to save Params for each tree since that will be more robust to 
future code changes (rather than re-creating them based on the GBT params).  
However, that may require some code refactoring so that the GBT can get a set 
of {{jsonParams}} for each tree.  Given that, the GBT could store that JSON in 
another DataFrame.

How does that sound?

It may make sense to implement export/import for one ensemble model before the 
other since both might require changes to the single-tree save/load.  Would you 
mind helping to review each other's work?  Who would prefer to go first?  
Thanks!

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-24 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211407#comment-15211407
 ] 

Gayathri Murali commented on SPARK-13783:
-

[~yanboliang] I am working on Random Forest and I have similar options for 
discussion. One more suggestion here

1. Store each tree in single data frame, saved to a single parquet file. treeID 
added to the node data and tree reconstruction done using the pre order 
approach used in DecisionTree. Does this approach work? 

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211392#comment-15211392
 ] 

Yanbo Liang commented on SPARK-13783:
-

GBTClassificationModel contains array of DecisionTreeRegressionModel. For 
import/export, we have two options for discussion:

* #1 We iteratively call DecisionTreeRegressionModel.save() to save each 
DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively 
using DecisionTreeRegressionModel.load(). We can reuse all save/load functions 
of DecisionTree and we can persistent each DecisionTree's params such as 
"numFeaturesā€¯ which can be used to reconstruct the DecisionTreeRegressionModel. 
But in this option, we can not store the GBT model in a single DataFrame.

* #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] 
in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can 
not save the params of each DecisionTreeRegressionModel. If further the DT 
Model need extra params to reconstruct, we should special handle them.
I vote to #1 and looking forward to other comments. [~josephkb]

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-19 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201937#comment-15201937
 ] 

yuhao yang commented on SPARK-13783:


I haven't started it yet. Go ahead please.

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-19 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201303#comment-15201303
 ] 

Yanbo Liang commented on SPARK-13783:
-

Hi [~yuhaoyan], are you working on this issue? If not, I can give a try.

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-09 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188358#comment-15188358
 ] 

yuhao yang commented on SPARK-13783:


I'm interested. 

> Model export/import for spark.ml: GBTs
> --
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org