[ https://issues.apache.org/jira/browse/FLINK-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235798#comment-16235798 ]
Thalita Vergilio commented on FLINK-7965: ----------------------------------------- Full log from the container running JobManager: {quote}Starting Job Manager config file: jobmanager.rpc.address: jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 1 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.web.port: 8081 blob.server.port: 6124 query.server.port: 6125 Starting jobmanager as a console application on host c30e0fe7b765. 2017-11-02 13:42:33,721 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC) 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Current user: flink 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 981 MiBytes 2017-11-02 13:42:33,796 INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: /docker-java-home/jre 2017-11-02 13:42:33,799 INFO org.apache.flink.runtime.jobmanager.JobManager - Hadoop version: 2.7.2 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms1024m 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx1024m 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments: 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - /opt/flink/conf 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - cluster 2017-11-02 13:42:33,800 INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar::: 2017-11-02 13:42:33,801 INFO org.apache.flink.runtime.jobmanager.JobManager - -------------------------------------------------------------------------------- 2017-11-02 13:42:33,801 INFO org.apache.flink.runtime.jobmanager.JobManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-11-02 13:42:33,911 INFO org.apache.flink.runtime.jobmanager.JobManager - Loading configuration from /opt/flink/conf 2017-11-02 13:42:33,914 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-11-02 13:42:33,915 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 13:42:33,916 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 13:42:33,916 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 13:42:33,917 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 13:42:33,917 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 13:42:33,924 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager without high-availability 2017-11-02 13:42:33,926 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager on jobmanager:6123 with execution mode CLUSTER 2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 13:42:33,934 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 13:42:33,935 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 13:42:33,936 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 13:42:33,936 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 13:42:33,962 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE) 2017-11-02 13:42:34,026 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system reachable at jobmanager:6123 2017-11-02 13:42:34,290 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2017-11-02 13:42:34,327 INFO Remoting - Starting remoting 2017-11-02 13:42:34,505 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@jobmanager:6123] 2017-11-02 13:42:34,524 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager web frontend 2017-11-02 13:42:34,532 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set. 2017-11-02 13:42:34,532 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'jobmanager.web.log.path'. 2017-11-02 13:42:34,532 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-9f0ba581-3488-4086-a79c-53e17b56352c for the web interface files 2017-11-02 13:42:34,533 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-17a58ccf-7d8b-475e-b727-4a7935a19c0f for web frontend JAR file uploads 2017-11-02 13:42:34,741 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend listening at 0:0:0:0:0:0:0:0:8081 2017-11-02 13:42:34,741 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor 2017-11-02 13:42:34,751 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-d10b620a-73ae-40af-bd23-aad5211fe1cc 2017-11-02 13:42:34,752 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000 2017-11-02 13:42:34,763 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 2017-11-02 13:42:34,769 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive 2017-11-02 13:42:34,774 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting with JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager on port 8081 2017-11-02 13:42:34,774 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@jobmanager:6123/user/jobmanager:00000000-0000-0000-0000-000000000000. 2017-11-02 13:42:34,776 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka.tcp://flink@jobmanager:6123/user/jobmanager. 2017-11-02 13:42:34,785 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager 2017-11-02 13:42:34,801 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager was granted leadership with leader session ID Some(00000000-0000-0000-0000-000000000000). 2017-11-02 13:42:34,814 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#844712453] - leader session 00000000-0000-0000-0000-000000000000{quote} > Docker-Flink: TaskManagers can't find JobManager when in different nodes in > Docker Swarm > ---------------------------------------------------------------------------------------- > > Key: FLINK-7965 > URL: https://issues.apache.org/jira/browse/FLINK-7965 > Project: Flink > Issue Type: Bug > Components: Docker > Affects Versions: 1.3.2 > Environment: node: ubuntu-swarm-master > Azure VM Standard D4s v3 (4 vcpus, 16 GB memory) > Docker version 17.03.1-ce, build c6d412e > node: azure-swarm-worker-1 > Azure VM Standard D2 v2 Promo (2 vcpus, 7 GB memory) > Docker version 17.09.0-ce, build afdb6d4 > Flink: using image 1.3.2-hadoop2-scala_2.10 > Reporter: Thalita Vergilio > Priority: Major > > This happens even when the nodes are in the same subnet. > I am using the Docker-Flink project in: > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > I am creating the services with the following commands: > {quote}docker network create -d overlay overlay > docker service create --name jobmanager --env > JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay > --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager > docker service create --name taskmanager --env > JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint > 'node.hostname != ubuntu-swarm-manager' flink taskmanager {quote} > I wonder if there's any configuration I'm missing. This is the error I get: > {quote} Trying to register at JobManager akka.tcp://flink@jobmanager:6123/ > user/jobmanager (attempt 4, timeout: 4000 milliseconds) {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)