tzulitai edited a comment on pull request #13773:
URL: https://github.com/apache/flink/pull/13773#issuecomment-717730004


   @Antti-Kaikkonen thanks you for trying the branch out.
   
   I think the exceptions you encountered are expected in the experiments 
you've tried out.
   
   Can you adjust your experiments to do the following, and then report back 
again?:
   
   Try out your application `FlinkStatefunCountTo1M` with a new build of 
StateFun that includes the changes in 
https://github.com/apache/flink-statefun/pull/168?
   
   You should be able to just pull that branch, do a clean build (`mvn clean 
install -DskipTests`), and then change the StateFun dependency in your 
application to `2.3-SNAPSHOT`.
   
   You should create a savepoint, and try to restore as you did in your 
previous test.
   Note that you should not need to apply any Flink fixes for this.
   
   ---
   
   Let me briefly explain our release plans here to address the issue you 
reported, and why the above adjustment makes sense:
   
   1. With the StateFun changes in 
https://github.com/apache/flink-statefun/pull/168 (and not including ANY Flink 
changes), we're expecting that restoring from checkpoints / savepoints should 
work properly now for all checkpoints / savepoints taken with a new version 
that includes https://github.com/apache/flink-statefun/pull/168. This would 
already address FLINK-19692, and we're planning to push out a StateFun hotfix 
release immediately to unblock you and other users that may be encountering the 
same issue.
   
   2. What https://github.com/apache/flink-statefun/pull/168 doesn't yet solve, 
is the ability to safely restore / upgrade from a savepoint taken with StateFun 
versions <= 2.2.0. This does not affect you if you don't have StateFun 
applications running in production yet. Enabling this requires this PR and 
#13761 to be fixed in Flink, release a new Flink version, and ultimately yet 
another follow-up StateFun hotfix releases that uses the new Flink version. 
That is a lengthier process, with an estimate of another 3-4 weeks, so we 
decided to go ahead with the above option first to move faster.
   
   ---
   
   TL;DR: It would be tremendously helpful if you can re-do your experiment 
only with a new StateFun build including 
https://github.com/apache/flink-statefun/pull/168 alone. Please do let me know 
of the results!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to