[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211392#comment-15211392
 ] 

Yanbo Liang commented on SPARK-13783:
-------------------------------------

GBTClassificationModel contains array of DecisionTreeRegressionModel. For 
import/export, we have two options for discussion:

* #1 We iteratively call DecisionTreeRegressionModel.save() to save each 
DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively 
using DecisionTreeRegressionModel.load(). We can reuse all save/load functions 
of DecisionTree and we can persistent each DecisionTree's params such as 
"numFeaturesā€¯ which can be used to reconstruct the DecisionTreeRegressionModel. 
But in this option, we can not store the GBT model in a single DataFrame.

* #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] 
in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can 
not save the params of each DecisionTreeRegressionModel. If further the DT 
Model need extra params to reconstruct, we should special handle them.
I vote to #1 and looking forward to other comments. [~josephkb]

> Model export/import for spark.ml: GBTs
> --------------------------------------
>
>                 Key: SPARK-13783
>                 URL: https://issues.apache.org/jira/browse/SPARK-13783
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to