[ https://issues.apache.org/jira/browse/FLINK-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235799#comment-16235799 ]
Thalita Vergilio commented on FLINK-7965: ----------------------------------------- And here is the log from the container running TaskManager: {quote} Starting Task Manager config file: jobmanager.rpc.address: jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 2 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.web.port: 8081 blob.server.port: 6124 query.server.port: 6125 Starting taskmanager as a console application on host 00afd4130a94. 2017-11-02 14:06:50,870 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-11-02 14:06:50,944 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-02 14:06:50,944 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC) 2017-11-02 14:06:50,945 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: flink 2017-11-02 14:06:50,945 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15 2017-11-02 14:06:50,945 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 1024 MiBytes 2017-11-02 14:06:50,945 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: /docker-java-home/jre 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.2 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:+UseG1GC 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms1024M 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx1024M 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=8388607T 2017-11-02 14:06:50,948 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments: 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - /opt/flink/conf 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar::: 2017-11-02 14:06:50,949 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-02 14:06:50,950 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-11-02 14:06:50,953 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 1048576 2017-11-02 14:06:50,972 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /opt/flink/conf 2017-11-02 14:06:50,976 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 14:06:50,976 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 14:06:50,976 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 14:06:50,977 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 14:06:50,977 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-02 14:06:50,977 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 14:06:50,977 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 14:06:50,977 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 14:06:50,978 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 14:06:50,978 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 14:06:50,985 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 14:06:50,986 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 14:06:50,986 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 14:06:50,986 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 14:06:50,986 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-02 14:06:50,986 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 14:06:50,987 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 14:06:50,987 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 14:06:50,988 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 14:06:50,988 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 14:06:51,013 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE) 2017-11-02 14:06:51,064 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager. 2017-11-02 14:06:51,065 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics 2017-11-02 14:06:51,067 INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address jobmanager/10.0.0.2:6123. 2017-11-02 14:06:54,578 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/10.0.0.2:6123 2017-11-02 14:06:54,779 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out 2017-11-02 14:06:54,829 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:06:54,880 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:06:54,931 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out 2017-11-02 14:06:54,981 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:06:55,031 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:06:55,032 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed) 2017-11-02 14:06:56,034 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out 2017-11-02 14:06:57,036 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:06:58,037 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:06:58,038 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed) 2017-11-02 14:06:58,138 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address jobmanager/10.0.0.2:6123 2017-11-02 14:06:58,339 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '00afd4130a94/10.0.0.5': connect timed out 2017-11-02 14:06:58,389 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:06:58,439 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:06:58,490 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out 2017-11-02 14:06:58,541 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:06:58,592 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:06:58,592 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed) 2017-11-02 14:06:59,593 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/172.18.0.3': connect timed out 2017-11-02 14:07:00,595 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.5': connect timed out 2017-11-02 14:07:01,599 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.0.4': connect timed out 2017-11-02 14:07:01,599 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed) 2017-11-02 14:07:01,600 WARN org.apache.flink.runtime.net.ConnectionUtils - Could not connect to jobmanager/10.0.0.2:6123. Selecting a local address using heuristics. 2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager will use hostname/address '00afd4130a94' (10.0.0.5) for communication. 2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager 2017-11-02 14:07:01,601 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor system at 00afd4130a94:0. 2017-11-02 14:07:01,947 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2017-11-02 14:07:01,978 INFO Remoting - Starting remoting 2017-11-02 14:07:02,168 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@00afd4130a94:33881] 2017-11-02 14:07:02,174 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor 2017-11-02 14:07:02,192 INFO org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig [server address: 00afd4130a94/10.0.0.5, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 2 (manual), number of client threads: 2 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] 2017-11-02 14:07:02,199 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms 2017-11-02 14:07:02,201 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 29 GB, usable 25 GB (86.21% usable) 2017-11-02 14:07:02,286 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 101 MB for network buffer pool (number of memory segments: 3260, bytes per segment: 32768). 2017-11-02 14:07:02,393 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components. 2017-11-02 14:07:02,400 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful initialization (took 2 ms). 2017-11-02 14:07:02,434 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 32 ms). Listening on SocketAddress /10.0.0.5:42921. 2017-11-02 14:07:02,493 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 0.7 of the currently free heap space (640 MB), memory will be allocated lazily. 2017-11-02 14:07:02,498 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-e57d51fa-2269-4df0-9910-0fe26c6042bd for spill files. 2017-11-02 14:07:02,501 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 2017-11-02 14:07:02,553 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-2c0c063f-464e-48f1-9fb8-fcfa48868e3a 2017-11-02 14:07:02,564 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-0c5e2b25-70a2-4964-9eec-24b0e79d560e 2017-11-02 14:07:02,572 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor at akka://flink/user/taskmanager#1719715507. 2017-11-02 14:07:02,572 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager data connection information: df5992297d269fa16a5e945e1dce0451 @ 00afd4130a94 (dataPort=42921) 2017-11-02 14:07:02,573 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager has 2 task slot(s). 2017-11-02 14:07:02,574 INFO org.apache.flink.runtime.taskmanager.TaskManager - Memory usage stats: [HEAP: 113/1024/1024 MB, NON HEAP: 33/33/-1 MB (used/committed/max)] 2017-11-02 14:07:02,576 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds) 2017-11-02 14:07:03,106 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds) 2017-11-02 14:07:04,126 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds){quote} > Docker-Flink: TaskManagers can't find JobManager when in different nodes in > Docker Swarm > ---------------------------------------------------------------------------------------- > > Key: FLINK-7965 > URL: https://issues.apache.org/jira/browse/FLINK-7965 > Project: Flink > Issue Type: Bug > Components: Docker > Affects Versions: 1.3.2 > Environment: node: ubuntu-swarm-master > Azure VM Standard D4s v3 (4 vcpus, 16 GB memory) > Docker version 17.03.1-ce, build c6d412e > node: azure-swarm-worker-1 > Azure VM Standard D2 v2 Promo (2 vcpus, 7 GB memory) > Docker version 17.09.0-ce, build afdb6d4 > Flink: using image 1.3.2-hadoop2-scala_2.10 > Reporter: Thalita Vergilio > Priority: Major > > This happens even when the nodes are in the same subnet. > I am using the Docker-Flink project in: > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > I am creating the services with the following commands: > {quote}docker network create -d overlay overlay > docker service create --name jobmanager --env > JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay > --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager > docker service create --name taskmanager --env > JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint > 'node.hostname != ubuntu-swarm-manager' flink taskmanager {quote} > I wonder if there's any configuration I'm missing. This is the error I get: > {quote} Trying to register at JobManager akka.tcp://flink@jobmanager:6123/ > user/jobmanager (attempt 4, timeout: 4000 milliseconds) {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)