galenwarren commented on pull request #15599: URL: https://github.com/apache/flink/pull/15599#issuecomment-1017633539
@sap1ens I'm glad that you've found this useful! I wanted to give you a heads up about one thing, though, which is that there could be a problem with the final version of the code reading save/checkpoint data written by earlier versions of the code. Without going into too much detail, there are two types that get written into save/checkpoint data -- `ResumeRecoverable` and `CommitRecoverable`. In the original implementation, these were implemented by the same class, used the same serializer, and so had the same serialized format. However, later in the project, these were changed to be implemented by separate classes, with separate serializers and separate (though related) serialized formats. And, while the `SimpleVersionedSerializer` does support versioning, I did not bump the version of the serializers in the final code -- they are still at version 0. So it would not be simple to determine whether save/checkpoint data for these recoverables were serialized the old way vs. the new way during deserialization. I'm not sure exactly how you're using this now -- if you have the ability to stop jobs and restart them without starting from a savepoint, then there should be no problem. If you will need to start from a savepoint when moving to the new version of the code, you could have a problem reading the savepoint data for any in-progress writes to GCS. One pretty simple thing we could do to make this easier would be to bump the serializer versions in the to-be-released code to 1, instead of 0. I don't think that would make any difference to new users, but it would allow you to distinguish the two formats in a fork so that you could read the old format properly (with a bit of extra code). I'd have to run this by @xintongsong, though, since he's already merged the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org