galenwarren commented on pull request #15599:
URL: https://github.com/apache/flink/pull/15599#issuecomment-1017633539


   @sap1ens I'm glad that you've found this useful! I wanted to give you a 
heads up about one thing, though, which is that there could be a problem with 
the final version of the code reading save/checkpoint data written by earlier 
versions of the code.
   
   Without going into too much detail, there are two types that get written 
into save/checkpoint data -- `ResumeRecoverable` and `CommitRecoverable`. In 
the original implementation, these were implemented by the same class, used the 
same serializer, and so had the same serialized format. However, later in the 
project, these were changed to be implemented by separate classes, with 
separate serializers and separate (though related) serialized formats. And, 
while the `SimpleVersionedSerializer` does support versioning, I did not bump 
the version of the serializers in the final code -- they are still at version 
0. So it would not be simple to determine whether save/checkpoint data for 
these recoverables were serialized the old way vs. the new way during 
deserialization.
   
   I'm not sure exactly how you're using this now -- if you have the ability to 
stop jobs and restart them without starting from a savepoint, then there should 
be no problem.
   
   If you will need to start from a savepoint when moving to the new version of 
the code, you could have a problem reading the savepoint data for any 
in-progress writes to GCS. One pretty simple thing we could do to make this 
easier would be to bump the serializer versions in the to-be-released code to 
1, instead of 0. I don't think that would make any difference to new users, but 
it would allow you to distinguish the two formats in a fork so that you could 
read the old format properly (with a bit of extra code).
   
   I'd have to run this by @xintongsong, though, since he's already merged the 
code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to