[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211392#comment-15211392 ]
Yanbo Liang commented on SPARK-13783: ------------------------------------- GBTClassificationModel contains array of DecisionTreeRegressionModel. For import/export, we have two options for discussion: * #1 We iteratively call DecisionTreeRegressionModel.save() to save each DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively using DecisionTreeRegressionModel.load(). We can reuse all save/load functions of DecisionTree and we can persistent each DecisionTree's params such as "numFeaturesā€¯ which can be used to reconstruct the DecisionTreeRegressionModel. But in this option, we can not store the GBT model in a single DataFrame. * #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can not save the params of each DecisionTreeRegressionModel. If further the DT Model need extra params to reconstruct, we should special handle them. I vote to #1 and looking forward to other comments. [~josephkb] > Model export/import for spark.ml: GBTs > -------------------------------------- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org