[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-6725: ------------------------------------- Comment: was deleted (was: Ping! Is anyone interested in picking up the GBT or RandomForest issues to get them into 2.0?) > Model export/import for Pipeline API (Scala) > -------------------------------------------- > > Key: SPARK-6725 > URL: https://issues.apache.org/jira/browse/SPARK-6725 > Project: Spark > Issue Type: Umbrella > Components: ML > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Priority: Critical > > This is an umbrella JIRA for adding model export/import to the spark.ml API. > This JIRA is for adding the internal Saveable/Loadable API and Parquet-based > format, not for other formats like PMML. > This will require the following steps: > * Add export/import for all PipelineStages supported by spark.ml > ** This will include some Transformers which are not Models. > ** These can use almost the same format as the spark.mllib model save/load > functions, but the model metadata must store a different class name (marking > the class as a spark.ml class). > * After all PipelineStages support save/load, add an interface which forces > future additions to support save/load. > *UPDATE*: In spark.ml, we could save feature metadata using DataFrames. > Other libraries and formats can support this, and it would be great if we > could too. We could do either of the following: > * save() optionally takes a dataset (or schema), and load will return a > (model, schema) pair. > * Models themselves save the input schema. > Both options would mean inheriting from new Saveable, Loadable types. > *UPDATE: DESIGN DOC*: Here's a design doc which I wrote. If you have > comments about the planned implementation, please comment in this JIRA. > Thanks! > [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org