tillrohrmann commented on a change in pull request #14254: URL: https://github.com/apache/flink/pull/14254#discussion_r532464322
########## File path: docs/deployment/ha/zookeeper_ha.md ########## @@ -23,113 +23,104 @@ specific language governing permissions and limitations under the License. --> -## ZooKeeper HA Services +Flink's ZooKeeper HA services use [ZooKeeper](http://zookeeper.apache.org) for high availability services. -One high availability services implementation uses ZooKeeper. +* Toc +{:toc} -### Configuration +Flink leverages **[ZooKeeper](http://zookeeper.apache.org)** for *distributed coordination* between all running JobManager instances. +ZooKeeper is a separate service from Flink, which provides highly reliable distributed coordination via leader election and light-weight consistent state storage. +Check out [ZooKeeper's Getting Started Guide](http://zookeeper.apache.org/doc/current/zookeeperStarted.html) for more information about ZooKeeper. +Flink includes scripts to [bootstrap a simple ZooKeeper](#bootstrap-zookeeper) installation. -To enable JobManager High Availability you have to set the **high-availability mode** to *zookeeper*, configure a **ZooKeeper quorum** and set up a **masters file** with all JobManagers hosts and their web UI ports. +## Configuration -Flink leverages **[ZooKeeper](http://zookeeper.apache.org)** for *distributed coordination* between all running JobManager instances. ZooKeeper is a separate service from Flink, which provides highly reliable distributed coordination via leader election and light-weight consistent state storage. Check out [ZooKeeper's Getting Started Guide](http://zookeeper.apache.org/doc/current/zookeeperStarted.html) for more information about ZooKeeper. Flink includes scripts to [bootstrap a simple ZooKeeper](#bootstrap-zookeeper) installation. +In order to start an HA-cluster you have to configure the following configuration keys: -#### Masters File (masters) - -In order to start an HA-cluster configure the *masters* file in `conf/masters`: - -- **masters file**: The *masters file* contains all hosts, on which JobManagers are started, and the ports to which the web user interface binds. - - <pre> -jobManagerAddress1:webUIPort1 -[...] -jobManagerAddressX:webUIPortX - </pre> - -By default, the job manager will pick a *random port* for inter process communication. You can change this via the **`high-availability.jobmanager.port`** key. This key accepts single ports (e.g. `50010`), ranges (`50000-50025`), or a combination of both (`50010,50011,50020-50025,50050-50075`). - -#### Config File (flink-conf.yaml) - -In order to start an HA-cluster add the following configuration keys to `conf/flink-conf.yaml`: - -- **high-availability mode** (required): The *high-availability mode* has to be set in `conf/flink-conf.yaml` to *zookeeper* in order to enable high availability mode. -Alternatively this option can be set to FQN of factory class Flink should use to create HighAvailabilityServices instance. +- **high-availability mode** (required): +The `high-availability` option has to be set to *zookeeper*. <pre>high-availability: zookeeper</pre> -- **ZooKeeper quorum** (required): A *ZooKeeper quorum* is a replicated group of ZooKeeper servers, which provide the distributed coordination service. +- **ZooKeeper quorum** (required): +A *ZooKeeper quorum* is a replicated group of ZooKeeper servers, which provide the distributed coordination service. <pre>high-availability.zookeeper.quorum: address1:2181[,...],addressX:2181</pre> Each *addressX:port* refers to a ZooKeeper server, which is reachable by Flink at the given address and port. -- **ZooKeeper root** (recommended): The *root ZooKeeper node*, under which all cluster nodes are placed. +- **ZooKeeper root** (recommended): +The *root ZooKeeper node*, under which all cluster nodes are placed. - <pre>high-availability.zookeeper.path.root: /flink + <pre>high-availability.zookeeper.path.root: /flink</pre> -- **ZooKeeper cluster-id** (recommended): The *cluster-id ZooKeeper node*, under which all required coordination data for a cluster is placed. +- **ZooKeeper cluster-id** (recommended): +The *cluster-id ZooKeeper node*, under which all required coordination data for a cluster is placed. <pre>high-availability.cluster-id: /default_ns # important: customize per cluster</pre> - **Important**: You should not set this value manually when running a YARN - cluster, a per-job YARN session, or on another cluster manager. In those - cases a cluster-id is automatically being generated based on the application - id. Manually setting a cluster-id overrides this behaviour in YARN. - Specifying a cluster-id with the -z CLI option, in turn, overrides manual - configuration. If you are running multiple Flink HA clusters on bare metal, - you have to manually configure separate cluster-ids for each cluster. + **Important**: + You should not set this value manually when running on YARN, native Kubernetes or on another cluster manager. + In those cases a cluster-id is automatically being generated. + If you are running multiple Flink HA clusters on bare metal, you have to manually configure separate cluster-ids for each cluster. -- **Storage directory** (required): JobManager metadata is persisted in the file system *storageDir* and only a pointer to this state is stored in ZooKeeper. +- **Storage directory** (required): +JobManager metadata is persisted in the file system `high-availability.storageDir` and only a pointer to this state is stored in ZooKeeper. - <pre> -high-availability.storageDir: hdfs:///flink/recovery - </pre> - - The `storageDir` stores all metadata needed to recover a JobManager failure. + <pre>high-availability.storageDir: hdfs:///flink/recovery</pre> -After configuring the masters and the ZooKeeper quorum, you can use the provided cluster startup scripts as usual. They will start an HA-cluster. Keep in mind that the **ZooKeeper quorum has to be running** when you call the scripts and make sure to **configure a separate ZooKeeper root path** for each HA cluster you are starting. + The `storageDir` stores all metadata needed to recover a JobManager failure. -#### Example: Standalone Cluster with 2 JobManagers +### Example configuration -1. **Configure high availability mode and ZooKeeper quorum** in `conf/flink-conf.yaml`: +Configure high availability mode and ZooKeeper quorum in `conf/flink-conf.yaml`: - <pre> +{% highlight bash %} high-availability: zookeeper high-availability.zookeeper.quorum: localhost:2181 high-availability.zookeeper.path.root: /flink high-availability.cluster-id: /cluster_one # important: customize per cluster -high-availability.storageDir: hdfs:///flink/recovery</pre> +high-availability.storageDir: hdfs:///flink/recovery +{% endhighlight %} + +## Configuring for Zookeeper Security -2. **Configure masters** in `conf/masters`: +If ZooKeeper is running in secure mode with Kerberos, you can override the following configurations in `flink-conf.yaml` as necessary: - <pre> -localhost:8081 -localhost:8082</pre> +{% highlight bash %} +zookeeper.sasl.service-name: zookeeper # default is "zookeeper". If the ZooKeeper quorum is configured + # with a different service name then it can be supplied here. +zookeeper.sasl.login-context-name: Client # default is "Client". The value needs to match one of the values + # configured in "security.kerberos.login.contexts". +{% endhighlight %} -3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only possible to run a single ZooKeeper server per machine): +For more information on Flink configuration for Kerberos security, please see [here]({% link deployment/config.md %}). +You can also find [here]({% link deployment/security/security-kerberos.md %}) further details on how Flink internally setups Kerberos-based security. Review comment: Will update it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org