Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/3258#discussion_r99369385 --- Diff: docs/ops/upgrading.md --- @@ -35,23 +35,105 @@ under the License. ## Upgrading the Flink Framework Version - - Either "in place" : Savepoint -> stop/cancel -> shutdown cluster -> start new version -> start job - - Another cluster variant : Savepoint -> resume in other cluster -> "flip switch" -> shutdown old cluster +This section describes the general way of upgrading Flink framework version from version 1.1.x to 1.2.x and migrating your +jobs between the two versions. + +In a nutshell, this procedure consists of 2 fundamental steps: + +1. Take a savepoint in Flink 1.1.x for the jobs you want to migrate. +2. Resume your jobs under Flink 1.2.x from the previously taken savepoints. + +Besides those two fundamental steps, some additional steps can be required that depend on the way you want to change the +Flink version. In this guide we differentiate two approaches to upgrade from Flink 1.1.x to 1.2.x: **inplace** upgrade and +**shadow copy** upgrade. + +For **inplace** update, after taking savepoints, you need to: + + 1. Stop/cancel all running jobs. + 2. Shutdown the cluster that runs Flink 1.1.x. + 3. Upgrade Flink to 1.2.x. on the cluster. + 4. Restart the cluster under the new version. + +For **shadow copy**, you need to: + + 1. Before resuming from the savepoint, setup a new installation of Flink 1.2.x besides your old Flink 1.1.x installation. + 2. Resume from the savepoints with the new Flink 1.2.x installation. + 3. If everything runs ok, stop and shutdown the old Flink 1.1.x cluster. + +In the following, we will first present the preconditions for successful job migration and then go into more detail +about the steps that we outlined before. + +### Preconditions + +Before starting the migration, please check that the jobs you are trying to migrate are following the +best-practises for [savepoints]({{ site.baseurl }}/setup/savepoints.html). In particular, we advise you to check that +explicit `uid`s were set for operators in your job. + +This is a *soft* precondition, and restore *should* still work in case you forgot about assigning `uid`s. +If you run into a case where this is not working, you can *manually* add the generated legacy vertex ids from Flink 1.1 +to your job using the `setUidHash(String hash)` call. For each operator (in operator chains: only head operator) you +must assign the 32 character hex string representing the hash that you can see in the web ui or logs for the operator. + +Besides operator uids, there are currently three *hard* preconditions for job migration that will make migration fail: + +1. as mentioned in earlier release notes, we do not support migration for state in RocksDB that was checkpointed using +`semi-asynchronous` mode. In case your old job was using this mode, you can still change your job to use +`fully-asynchronous` mode before taking the savepoint that is used as basis for the migration. + +2. The CEP operator is currently not supported for migration. If your job uses this operator you can (curently) not +migrate it. We are planning to provide migration support for CEP operator in a later bugfix release. --- End diff -- for the CEP operator
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---