[jira] [Comment Edited] (SPARK-15573) Backwards-compatible persistence for spark.ml

Asher Krim (JIRA) Tue, 24 Jan 2017 12:42:48 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836604#comment-15836604
 ]


Asher Krim edited comment on SPARK-15573 at 1/24/17 8:41 PM:
-------------------------------------------------------------

Thanks for your comment Joseph

What I mean by "coupling" is that relying on the spark version means that 
there's a time element to a PR which, to me at least, feels foreign. Imagine a 
commiter who uses spark 2.2 as the version for a fix, but the PR gets delayed 
to 2.3. Now a code change is needed, but there might not be any test or compile 
errors to make that obvious.

Regarding backporting, what I have in mind is specifically making bugfixes 
back-portable to a future 1.6.x release. For example, there are bugs in the 
word2vec+LDA model save logic. Relying on the spark version makes it clunky to 
backport the fix (the code would have to be different to account for different 
spark versions). 

I agree that relying on columns/content can be problematic for the reason you 
mentioned. 
Relying on an internal version flag like in MLlib seems like the lesser evil. 
I'm wondering, is there a particular reason that it was abandoned in ML? 


was (Author: akrim):
Thanks for your comment [~jkbradley]

What I mean by "coupling" is that relying on the spark version means that 
there's a time element to a PR which, to me at least, feels foreign. Imagine a 
commiter who uses spark 2.2 as the version for a fix, but the PR gets delayed 
to 2.3. Now a code change is needed, but there might not be any test or compile 
errors to make that obvious.

Regarding backporting, what I have in mind is specifically making bugfixes 
back-portable to a future 1.6.x release. For example, there are bugs in the 
word2vec+LDA model save logic. Relying on the spark version makes it clunky to 
backport the fix (the code would have to be different to account for different 
spark versions). 

I agree that relying on columns/content can be problematic for the reason you 
mentioned. 
Relying on an internal version flag like in MLlib seems like the lesser evil. 
I'm wondering, is there a particular reason that it was abandoned in ML? 

> Backwards-compatible persistence for spark.ml
> ---------------------------------------------
>
>                 Key: SPARK-15573
>                 URL: https://issues.apache.org/jira/browse/SPARK-15573
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for imposing backwards-compatible persistence for the 
> DataFrames-based API for MLlib.  I.e., we want to be able to load models 
> saved in previous versions of Spark.  We will not require loading models 
> saved in later versions of Spark.
> This requires:
> * Putting unit tests in place to check loading models from previous versions
> * Notifying all committers active on MLlib to be aware of this requirement in 
> the future
> The unit tests could be written as in spark.mllib, where we essentially 
> copied and pasted the save() code every time it changed.  This happens 
> rarely, so it should be acceptable, though other designs are fine.
> Subtasks of this JIRA should cover checking and adding tests for existing 
> cases, such as KMeansModel (whose format changed between 1.6 and 2.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15573) Backwards-compatible persistence for spark.ml

Reply via email to