[
https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley updated SPARK-23154:
--
Description:
We have (as far as I know) maintained backwards compatibility for ML
persistence, but this is not documented anywhere. I'd like us to document it
(for spark.ml, not for spark.mllib).
I'd recommend something like:
{quote}
In general, MLlib maintains backwards compatibility for ML persistence. I.e.,
if you save an ML model or Pipeline in one version of Spark, then you should be
able to load it back and use it in a future version of Spark. However, there
are rare exceptions, described below.
Model persistence: Is a model or Pipeline saved using Apache Spark ML
persistence in Spark version X loadable by Spark version Y?
* Major versions: No guarantees, but best-effort.
* Minor and patch versions: Yes; these are backwards compatible.
* Note about the format: There are no guarantees for a stable persistence
format, but model loading itself is designed to be backwards compatible.
Model behavior: Does a model or Pipeline in Spark version X behave identically
in Spark version Y?
* Major versions: No guarantees, but best-effort.
* Minor and patch versions: Identical behavior, except for bug fixes.
For both model persistence and model behavior, any breaking changes across a
minor version or patch version are reported in the Spark version release notes.
If a breakage is not reported in release notes, then it should be treated as a
bug to be fixed.
{quote}
How does this sound?
Note: We unfortunately don't have tests for backwards compatibility (which has
technical hurdles and can be discussed in [SPARK-15573]). However, we have
made efforts to maintain it during PR review and Spark release QA, and most
users expect it.
was:
We have (as far as I know) maintained backwards compatibility for ML
persistence, but this is not documented anywhere. I'd like us to document it
(for spark.ml, not for spark.mllib).
I'd recommend something like:
{quote}
In general, MLlib maintains backwards compatibility for ML persistence. I.e.,
if you save an ML model or Pipeline in one version of Spark, then you should be
able to load it back and use it in a future version of Spark. However, there
are rare exceptions, described below.
Model persistence: Is a model or Pipeline saved using Apache Spark ML
persistence in Spark version X loadable by Spark version Y?
* Major versions: No guarantees, but best-effort.
* Minor and patch versions: Yes; these are backwards compatible.
* Note about the format: There are no guarantees for a stable persistence
format, but model loading itself is designed to be backwards compatible.
Model behavior: Does a model or Pipeline in Spark version X behave identically
in Spark version Y?
* Major versions: No guarantees, but best-effort.
* Minor and patch versions: Identical behavior, except for bug fixes.
For both model persistence and model behavior, any breaking changes across a
minor version or patch version are reported in the Spark version release notes.
If a breakage is not reported in release notes, then it should be treated as a
bug to be fixed.
{quote}
How does this sound?
> Document backwards compatibility guarantees for ML persistence
> --
>
> Key: SPARK-23154
> URL: https://issues.apache.org/jira/browse/SPARK-23154
> Project: Spark
> Issue Type: Documentation
> Components: Documentation, ML
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Major
>
> We have (as far as I know) maintained backwards compatibility for ML
> persistence, but this is not documented anywhere. I'd like us to document it
> (for spark.ml, not for spark.mllib).
> I'd recommend something like:
> {quote}
> In general, MLlib maintains backwards compatibility for ML persistence.
> I.e., if you save an ML model or Pipeline in one version of Spark, then you
> should be able to load it back and use it in a future version of Spark.
> However, there are rare exceptions, described below.
> Model persistence: Is a model or Pipeline saved using Apache Spark ML
> persistence in Spark version X loadable by Spark version Y?
> * Major versions: No guarantees, but best-effort.
> * Minor and patch versions: Yes; these are backwards compatible.
> * Note about the format: There are no guarantees for a stable persistence
> format, but model loading itself is designed to be backwards compatible.
> Model behavior: Does a model or Pipeline in Spark version X behave
> identically in Spark version Y?
> * Major versions: No guarantees, but best-effort.
> * Minor and patch versions: Identical behavior, except for bug fixes.
> For both model persistence and model behavior, any