Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/3259#discussion_r99375767 --- Diff: docs/ops/production_ready.md --- @@ -0,0 +1,88 @@ +--- +title: "Production Readiness Checklist" +nav-parent_id: setup +nav-pos: 20 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +* ToC +{:toc} + +## Production Readiness Checklist + +Purpose of this production readiness checklist is to provide a condensed overview of configuration options that are +important and need **careful considerations** if you plan to bring your Flink job into **production**. For most of these options +Flink provides out-of-the-box defaults to make usage and adoption of Flink easier. For many users and scenarios, those +defaults are good starting points for development and completely sufficient for "one-shot" jobs. + +However, once you are planning to bring a Flink appplication to production the requirements typically increase. For example, +you want your job to be (re-)scalable and to have a good upgrade story for your job and new Flink versions. + +In the following, we present a collection of configuration options that you should check before your job goes into production. + +### Set maximum parallelism for operators explicitly + +Maximum parallelism is a configuration parameter that is newly introduced in Flink 1.2 and has important implications +for the (re-)scalability of your Flink job. This parameter, which can be set on a per-job and/or per-operator granularity, +determines the maximum parallelism to which you can scale operators. It is important to understand that (as of now) there +is **now way to increase** this parameter after your job was initially started, except for restarting your job completely +from scratch (i.e. with a new state, and not from a previous checkpoint/savepoint). Even if Flink would provide some way +to change maximum parallelism for existing savepoints in the future, you can already assume that for large states this is +likely a long running operation that you want to avoid. At this point, you might wonder why not just to use a very high +value as default for this parameter. The reason behind this is that high maximum parallelism can have some impact on your +applications performance and even state sizes, because Flink has to maintain certain meta data for it's ability to rescale which +can increase with the maximum parallelism. In general, you should chose a max parallelism that is high enough to fit your --- End diff -- choose
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---