[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229700#comment-15229700 ] Apache Spark commented on SPARK-13783: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/12230 > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226993#comment-15226993 ] Joseph K. Bradley commented on SPARK-13783: --- Great, thanks! > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225525#comment-15225525 ] Yanbo Liang commented on SPARK-13783: - [~josephkb] I will work on this. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224954#comment-15224954 ] Joseph K. Bradley commented on SPARK-13783: --- Would someone like to take this now? > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214817#comment-15214817 ] Joseph K. Bradley commented on SPARK-13783: --- [~GayathriMurali] Sounds good! Please ping both [~yanboliang] and me when you send your PR. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213359#comment-15213359 ] Yanbo Liang commented on SPARK-13783: - [~GayathriMurali] Please go first, I will help to review your code. After it get merged, I will start my PR. Thanks! > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212442#comment-15212442 ] Gayathri Murali commented on SPARK-13783: - Thanks [~josephkb]. I can go first, as I am almost done making changes. I could definitely review [~yanboliang]'s code and would really appreciate the same help. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212428#comment-15212428 ] Joseph K. Bradley commented on SPARK-13783: --- I'd prefer what [~GayathriMurali] mentioned; that's what is done in spark.mllib. That should be more efficient (taking more advantage of columnar storage). I do want us to save Params for each tree since that will be more robust to future code changes (rather than re-creating them based on the GBT params). However, that may require some code refactoring so that the GBT can get a set of {{jsonParams}} for each tree. Given that, the GBT could store that JSON in another DataFrame. How does that sound? It may make sense to implement export/import for one ensemble model before the other since both might require changes to the single-tree save/load. Would you mind helping to review each other's work? Who would prefer to go first? Thanks! > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211407#comment-15211407 ] Gayathri Murali commented on SPARK-13783: - [~yanboliang] I am working on Random Forest and I have similar options for discussion. One more suggestion here 1. Store each tree in single data frame, saved to a single parquet file. treeID added to the node data and tree reconstruction done using the pre order approach used in DecisionTree. Does this approach work? > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211392#comment-15211392 ] Yanbo Liang commented on SPARK-13783: - GBTClassificationModel contains array of DecisionTreeRegressionModel. For import/export, we have two options for discussion: * #1 We iteratively call DecisionTreeRegressionModel.save() to save each DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively using DecisionTreeRegressionModel.load(). We can reuse all save/load functions of DecisionTree and we can persistent each DecisionTree's params such as "numFeaturesā€¯ which can be used to reconstruct the DecisionTreeRegressionModel. But in this option, we can not store the GBT model in a single DataFrame. * #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can not save the params of each DecisionTreeRegressionModel. If further the DT Model need extra params to reconstruct, we should special handle them. I vote to #1 and looking forward to other comments. [~josephkb] > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201937#comment-15201937 ] yuhao yang commented on SPARK-13783: I haven't started it yet. Go ahead please. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201303#comment-15201303 ] Yanbo Liang commented on SPARK-13783: - Hi [~yuhaoyan], are you working on this issue? If not, I can give a try. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188358#comment-15188358 ] yuhao yang commented on SPARK-13783: I'm interested. > Model export/import for spark.ml: GBTs > -- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org