[ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836604#comment-15836604 ]
Asher Krim edited comment on SPARK-15573 at 1/24/17 8:41 PM: ------------------------------------------------------------- Thanks for your comment Joseph What I mean by "coupling" is that relying on the spark version means that there's a time element to a PR which, to me at least, feels foreign. Imagine a commiter who uses spark 2.2 as the version for a fix, but the PR gets delayed to 2.3. Now a code change is needed, but there might not be any test or compile errors to make that obvious. Regarding backporting, what I have in mind is specifically making bugfixes back-portable to a future 1.6.x release. For example, there are bugs in the word2vec+LDA model save logic. Relying on the spark version makes it clunky to backport the fix (the code would have to be different to account for different spark versions). I agree that relying on columns/content can be problematic for the reason you mentioned. Relying on an internal version flag like in MLlib seems like the lesser evil. I'm wondering, is there a particular reason that it was abandoned in ML? was (Author: akrim): Thanks for your comment [~jkbradley] What I mean by "coupling" is that relying on the spark version means that there's a time element to a PR which, to me at least, feels foreign. Imagine a commiter who uses spark 2.2 as the version for a fix, but the PR gets delayed to 2.3. Now a code change is needed, but there might not be any test or compile errors to make that obvious. Regarding backporting, what I have in mind is specifically making bugfixes back-portable to a future 1.6.x release. For example, there are bugs in the word2vec+LDA model save logic. Relying on the spark version makes it clunky to backport the fix (the code would have to be different to account for different spark versions). I agree that relying on columns/content can be problematic for the reason you mentioned. Relying on an internal version flag like in MLlib seems like the lesser evil. I'm wondering, is there a particular reason that it was abandoned in ML? > Backwards-compatible persistence for spark.ml > --------------------------------------------- > > Key: SPARK-15573 > URL: https://issues.apache.org/jira/browse/SPARK-15573 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Joseph K. Bradley > > This JIRA is for imposing backwards-compatible persistence for the > DataFrames-based API for MLlib. I.e., we want to be able to load models > saved in previous versions of Spark. We will not require loading models > saved in later versions of Spark. > This requires: > * Putting unit tests in place to check loading models from previous versions > * Notifying all committers active on MLlib to be aware of this requirement in > the future > The unit tests could be written as in spark.mllib, where we essentially > copied and pasted the save() code every time it changed. This happens > rarely, so it should be acceptable, though other designs are fine. > Subtasks of this JIRA should cover checking and adding tests for existing > cases, such as KMeansModel (whose format changed between 1.6 and 2.0). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org