[ https://issues.apache.org/jira/browse/FLINK-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959444#comment-15959444 ]
ASF GitHub Bot commented on FLINK-5974: --------------------------------------- GitHub user vijikarthi opened a pull request: https://github.com/apache/flink/pull/3692 FLINK-5974 Added configurations to support mesos-dns hostname resolution This PR addresses FLINK-5974 requirements which takes care of handling dynamic host name resolution for JM and TM components especially in some deployment environment like Mesos/DCOS. It addresses two main functionalities. a) Dynamic host name configuration Support for specifying hostname for JM/TM is already available through `-jobmanager.rpc.address` and `taskmanager.hostname` configurations. However in Mesos DC/OS type of environment, each task container can be looked up using an hostname alias which is derived using the format `<task>.<service>.mesos` where the service discovery is managed through `mesos-dns`. To support these dynamic hostname lookup, we have introduced a new configuration `mesos.resourcemanager.tasks.hostname` which takes the format `_TASK.<ANY_VALUE>`. When this property is supplied, the `_TASK` token will be replaced with the `TASK_ID` of the TM container and the final derived string will be used to populate `taskmanager.hostname` configuration. For example, in DCOS setup one could supply the configuration as `-Dmesos.resourcemanager.tasks.hostname=_TASK.{{FRAMEWORK_NAME}}.mesos` where `FRAMEWORK_NAME` could be `flink` Please refer to https://docs.mesosphere.com/1.9/usage/service-discovery/mesos-dns/service-naming/#a-records for more details on how Mesos service discovery works. b) Support to run *any* bootstrap script prior to execute TM startup script Currently, the TM boot script `mesos-taskmanager.sh` is the only script that is passed to Mesos launcher for booting TM container. In DC/OS environment where service discovery is common, we need a mechanism to wait for the service discovery records to be available and the hostname is indeed resolvable before launching the TM boot script. DCOS deployment offers a way to validate and wait for the service discovery records to be available before launching the tasks. Please see below links for more details on how it works. https://mesosphere.github.io/dcos-commons/developer-guide.html#task-bootstrap https://github.com/mesosphere/dcos-commons/blob/master/sdk/bootstrap/main.go To support this, we have introduced a new configuration `mesos.resourcemanager.tasks.cmd-prefix=$FLINK_HOME/bin/bootstrap` to provide any executable/script that can be configured to run prior to executing the TM bootstrap command. This feature *currently* works *only for Docker based image* where the bootstrap script can be pre-baked in to a specific location that can be used to configure `mesos.resourcemanager.tasks.cmd-prefix'. While both the implementations are helping in addressing the Mesos/DCOS type of deployment but the implementation is agnostic of these environments and can be used for any generic deployment that may need such a facility. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vijikarthi/flink FLINK-5974-Master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3692.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3692 ---- commit aeb432dc7fe8bcdd5faa49b8ad5dfb5630ea0747 Author: Vijay Srinivasaraghavan <vijayaraghavan.srinivasaragha...@emc.com> Date: 2017-04-06T16:48:39Z FLINK-5974 Added configurations to support mesos-dns hostname resolution ---- > Support Mesos DNS > ----------------- > > Key: FLINK-5974 > URL: https://issues.apache.org/jira/browse/FLINK-5974 > Project: Flink > Issue Type: Improvement > Components: Cluster Management, Mesos > Reporter: Eron Wright > Assignee: Vijay Srinivasaraghavan > > In certain Mesos/DCOS environments, the slave hostnames aren't resolvable. > For this and other reasons, Mesos DNS names would ideally be used for > communication within the Flink cluster, not the hostname discovered via > `InetAddress.getLocalHost`. > Some parts of Flink are already configurable in this respect, notably > `jobmanager.rpc.address`. However, the Mesos AppMaster doesn't use that > setting for everything (e.g. artifact server), it uses the hostname. > Similarly, the `taskmanager.hostname` setting isn't used in Mesos deployment > mode. To effectively use Mesos DNS, the TM should use > `<task-name>.<framework-name>.mesos` as its hostname. This could be derived > from an interpolated configuration string. -- This message was sent by Atlassian JIRA (v6.3.15#6346)