[jira] [Updated] (SPARK-14311) Model persistence in SparkR 2.0

Xiangrui Meng (JIRA) Fri, 29 Apr 2016 20:59:50 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiangrui Meng updated SPARK-14311:
----------------------------------
    Summary: Model persistence in SparkR 2.0  (was: Model persistence in SparkR)

> Model persistence in SparkR 2.0
> -------------------------------
>
>                 Key: SPARK-14311
>                 URL: https://issues.apache.org/jira/browse/SPARK-14311
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, SparkR
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> In Spark 2.0, we are going to have 4 ML models in SparkR: GLMs, k-means, 
> naive Bayes, and AFT survival regression. Users can fit models, get summary, 
> and make predictions. However, they cannot save/load the models yet.
> ML models in SparkR are wrappers around ML pipelines. So it should be 
> straightforward to implement model persistence. We need to think more about 
> the API. R uses save/load for objects and datasets (also objects). It is 
> possible to overload save for ML models, e.g., save.NaiveBayesWrapper. But 
> I'm not sure whether load can be overloaded easily. I propose the following 
> API:
> {code}
> model <- glm(formula, data = df)
> ml.save(model, path, mode = "overwrite")
> model2 <- ml.load(path)
> {code}
> We defined wrappers as S4 classes. So `ml.save` is an S4 method and ml.load 
> is a S3 method (correct me if I'm wrong).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-14311) Model persistence in SparkR 2.0

Reply via email to