[GitHub] flink pull request #3258: Flink job migration 1.1 -> 1.2 documentation

alpinegizmo Fri, 03 Feb 2017 08:27:49 -0800

Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3258#discussion_r99370593
  
    --- Diff: docs/ops/upgrading.md ---
    @@ -35,23 +35,105 @@ under the License.
     
     ## Upgrading the Flink Framework Version
     
    -  - Either "in place" : Savepoint -> stop/cancel -> shutdown cluster -> 
start new version -> start job 
    -  - Another cluster variant : Savepoint -> resume in other cluster -> 
"flip switch" -> shutdown old cluster
    +This section describes the general way of upgrading Flink framework 
version from version 1.1.x to 1.2.x and migrating your
    +jobs between the two versions.
    +
    +In a nutshell, this procedure consists of 2 fundamental steps:
    +
    +1. Take a savepoint in Flink 1.1.x for the jobs you want to migrate.
    +2. Resume your jobs under Flink 1.2.x from the previously taken savepoints.
    +
    +Besides those two fundamental steps, some additional steps can be required 
that depend on the way you want to change the
    +Flink version. In this guide we differentiate two approaches to upgrade 
from Flink 1.1.x to 1.2.x: **inplace** upgrade and 
    +**shadow copy** upgrade.
    +
    +For **inplace** update, after taking savepoints, you need to:
    +
    +  1. Stop/cancel all running jobs.
    +  2. Shutdown the cluster that runs Flink 1.1.x.
    +  3. Upgrade Flink to 1.2.x. on the cluster.
    +  4. Restart the cluster under the new version.
    +
    +For **shadow copy**, you need to:
    +
    +  1. Before resuming from the savepoint, setup a new installation of Flink 
1.2.x besides your old Flink 1.1.x installation.
    +  2. Resume from the savepoints with the new Flink 1.2.x installation.
    +  3. If everything runs ok, stop and shutdown the old Flink 1.1.x cluster.
    +
    +In the following, we will first present the preconditions for successful 
job migration and then go into more detail 
    +about the steps that we outlined before.
    +
    +### Preconditions
    +
    +Before starting the migration, please check that the jobs you are trying 
to migrate are following the
    +best-practises for [savepoints]({{ site.baseurl }}/setup/savepoints.html). 
In particular, we advise you to check that 
    +explicit `uid`s were set for operators in your job. 
    +
    +This is a *soft* precondition, and restore *should* still work in case you 
forgot about assigning `uid`s. 
    +If you run into a case where this is not working, you can *manually* add 
the generated legacy vertex ids from Flink 1.1 
    +to your job using the `setUidHash(String hash)` call. For each operator 
(in operator chains: only head operator) you 
    +must assign the 32 character hex string representing the hash that you can 
see in the web ui or logs for the operator.
    +
    +Besides operator uids, there are currently three *hard* preconditions for 
job migration that will make migration fail: 
    +
    +1. as mentioned in earlier release notes, we do not support migration for 
state in RocksDB that was checkpointed using 
    +`semi-asynchronous` mode. In case your old job was using this mode, you 
can still change your job to use 
    +`fully-asynchronous` mode before taking the savepoint that is used as 
basis for the migration.
    +
    +2. The CEP operator is currently not supported for migration. If your job 
uses this operator you can (curently) not 
    +migrate it. We are planning to provide migration support for CEP operator 
in a later bugfix release.
    +
    +3. Another **important** precondition is that all the savepoint data is 
accessible from the new installation and resides under
    +the same absolute path. Please notice that the savepoint data is typically 
not self contained in just the created 
    +savepoint file. Additional files can be referenced from inside the 
savepoint file (e.g. the output from state backend 
    +snapshots)! There is currently no simple way to identify and move all data 
that belongs to a savepoint.
    +
    +
    +### STEP 1: Taking a savepoint in Flink 1.1.x.
    +
    +First major step in job migration is taking a savepoint of your jon 
running in Flink 1.1.x. You can do this with the
    +command:
    +
    +```sh
    +$ bin/flink savepoint :jobId [:targetDirectory]
    +```
    +
    +For more details, please read the [savepoint documentation]({{ 
site.baseurl }}/setup/savepoints.html).
    +
    +### STEP 2: Updating your cluster to Flink 1.2.x.
    +
    +In this step, we update the framework version of the cluster. What this 
basically means is replacing the content of
    +the Flink installation with the new version. This step can depend on how 
you are running Flink in your cluster (e.g. 
    +standalone, on Mesos, ...).
    +
    +If you are unfamiliar with installing Flink in your cluster, please read 
the [deployment and cluster setup documentation]({{ site.baseurl 
}}/setup/index.html).
    +
    +### STEP 3: Resuming the job under Flink 1.2.x from Flink 1.1.x savepoint.
    +
    +As last step of job migration, you resume the from the taken savepoint on 
a cluster that runs Flink 1.2.x. You can do
    --- End diff --
    
    As the last step of job migration, you resume from the savepoint taken 
above on the updated cluster.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3258: Flink job migration 1.1 -> 1.2 documentation

Reply via email to