[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361654#comment-16361654 ] Apache Spark commented on SPARK-23154: -- User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/20592 > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361575#comment-16361575 ] Joseph K. Bradley commented on SPARK-23154: --- I'd prefer to put it in the subsection on saving & loading. I'll send a PR now. [~yanboliang] I actually spent a long time trying to come up with ways to test this, and it's non-trivial. The main blocker is that I got pushback from others about putting binary files (Parquet model data files) in the git repo. Without that, there isn't a way to store example models from past versions. I may just build a separate project to test this outside of apache/spark itself when I get the chance. You can find more notes in the JIRA linked in the description above. > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344885#comment-16344885 ] Nick Pentreath commented on SPARK-23154: Where do we intend to put this note? In [http://spark.apache.org/docs/latest/ml-pipeline.html#saving-and-loading-pipelines?] Or as a new section in [http://spark.apache.org/docs/latest/ml-guide.html]? > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334757#comment-16334757 ] Yanbo Liang commented on SPARK-23154: - Sounds good! It should be helpful to document backwards compatibility. Further more, I think we can write some tools to test the backwards compatibility for ML persistence during QA of releasing, just like performance regression test. Thanks. > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332252#comment-16332252 ] Nick Pentreath commented on SPARK-23154: SGTM > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331811#comment-16331811 ] Joseph K. Bradley commented on SPARK-23154: --- CC [~mlnick], [~yanboliang] or others, what do you think? > Document backwards compatibility guarantees for ML persistence > -- > > Key: SPARK-23154 > URL: https://issues.apache.org/jira/browse/SPARK-23154 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Major > > We have (as far as I know) maintained backwards compatibility for ML > persistence, but this is not documented anywhere. I'd like us to document it > (for spark.ml, not for spark.mllib). > I'd recommend something like: > {quote} > In general, MLlib maintains backwards compatibility for ML persistence. > I.e., if you save an ML model or Pipeline in one version of Spark, then you > should be able to load it back and use it in a future version of Spark. > However, there are rare exceptions, described below. > Model persistence: Is a model or Pipeline saved using Apache Spark ML > persistence in Spark version X loadable by Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Yes; these are backwards compatible. > * Note about the format: There are no guarantees for a stable persistence > format, but model loading itself is designed to be backwards compatible. > Model behavior: Does a model or Pipeline in Spark version X behave > identically in Spark version Y? > * Major versions: No guarantees, but best-effort. > * Minor and patch versions: Identical behavior, except for bug fixes. > For both model persistence and model behavior, any breaking changes across a > minor version or patch version are reported in the Spark version release > notes. If a breakage is not reported in release notes, then it should be > treated as a bug to be fixed. > {quote} > How does this sound? > Note: We unfortunately don't have tests for backwards compatibility (which > has technical hurdles and can be discussed in [SPARK-15573]). However, we > have made efforts to maintain it during PR review and Spark release QA, and > most users expect it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org