[ 
https://issues.apache.org/jira/browse/FLINK-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235799#comment-16235799
 ] 

Thalita Vergilio commented on FLINK-7965:
-----------------------------------------

And here is the log from the container running TaskManager:

{quote}
Starting Task Manager
config file:
jobmanager.rpc.address: jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 2
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting taskmanager as a console application on host 00afd4130a94.
2017-11-02 14:06:50,870 WARN  org.apache.hadoop.util.NativeCodeLoader           
            - Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
2017-11-02 14:06:50,944 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - 
--------------------------------------------------------------------------------
2017-11-02 14:06:50,944 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Starting TaskManager (Version: 1.3.2, Rev:0399bee, 
Date:03.08.2017 @ 10:23:11 UTC)
2017-11-02 14:06:50,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Current user: flink
2017-11-02 14:06:50,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
1.8/25.141-b15
2017-11-02 14:06:50,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Maximum heap size: 1024 MiBytes
2017-11-02 14:06:50,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  JAVA_HOME: /docker-java-home/jre
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Hadoop version: 2.7.2
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  JVM Options:
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     -XX:+UseG1GC
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     -Xms1024M
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     -Xmx1024M
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     -XX:MaxDirectMemorySize=8388607T
2017-11-02 14:06:50,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     
-Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Program Arguments:
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     --configDir
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -     /opt/flink/conf
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            -  Classpath: 
/opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
2017-11-02 14:06:50,949 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - 
--------------------------------------------------------------------------------
2017-11-02 14:06:50,950 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Registered UNIX signal handlers for [TERM, HUP, INT]
2017-11-02 14:06:50,953 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Maximum number of open file descriptors is 1048576
2017-11-02 14:06:50,972 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Loading configuration from /opt/flink/conf
2017-11-02 14:06:50,976 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 14:06:50,976 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.port, 6123
2017-11-02 14:06:50,976 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.heap.mb, 1024
2017-11-02 14:06:50,977 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.heap.mb, 1024
2017-11-02 14:06:50,977 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.numberOfTaskSlots, 2
2017-11-02 14:06:50,977 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.memory.preallocate, false
2017-11-02 14:06:50,977 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: parallelism.default, 1
2017-11-02 14:06:50,977 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.web.port, 8081
2017-11-02 14:06:50,978 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: blob.server.port, 6124
2017-11-02 14:06:50,978 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: query.server.port, 6125
2017-11-02 14:06:50,985 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.address, jobmanager
2017-11-02 14:06:50,986 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.port, 6123
2017-11-02 14:06:50,986 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.heap.mb, 1024
2017-11-02 14:06:50,986 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.heap.mb, 1024
2017-11-02 14:06:50,986 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.numberOfTaskSlots, 2
2017-11-02 14:06:50,986 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.memory.preallocate, false
2017-11-02 14:06:50,987 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: parallelism.default, 1
2017-11-02 14:06:50,987 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.web.port, 8081
2017-11-02 14:06:50,988 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: blob.server.port, 6124
2017-11-02 14:06:50,988 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: query.server.port, 6125
2017-11-02 14:06:51,013 INFO  
org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set 
to flink (auth:SIMPLE)
2017-11-02 14:06:51,064 INFO  
org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to 
select the network interface and address to use by connecting to the leading 
JobManager.
2017-11-02 14:06:51,065 INFO  
org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager 
will try to connect for 10000 milliseconds before falling back to heuristics
2017-11-02 14:06:51,067 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Retrieved new target address jobmanager/10.0.0.2:6123.
2017-11-02 14:06:54,578 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Trying to connect to address jobmanager/10.0.0.2:6123
2017-11-02 14:06:54,779 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '00afd4130a94/10.0.0.5': connect 
timed out
2017-11-02 14:06:54,829 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:54,880 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:54,931 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:54,981 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:55,031 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:55,032 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/127.0.0.1': Invalid argument 
(connect failed)
2017-11-02 14:06:56,034 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:57,036 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,037 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,038 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/127.0.0.1': Invalid argument 
(connect failed)
2017-11-02 14:06:58,138 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Trying to connect to address jobmanager/10.0.0.2:6123
2017-11-02 14:06:58,339 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '00afd4130a94/10.0.0.5': connect 
timed out
2017-11-02 14:06:58,389 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,439 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,490 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:06:58,541 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:06:58,592 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:06:58,592 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/127.0.0.1': Invalid argument 
(connect failed)
2017-11-02 14:06:59,593 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/172.18.0.3': connect timed out
2017-11-02 14:07:00,595 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.5': connect timed out
2017-11-02 14:07:01,599 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/10.0.0.4': connect timed out
2017-11-02 14:07:01,599 INFO  org.apache.flink.runtime.net.ConnectionUtils      
            - Failed to connect from address '/127.0.0.1': Invalid argument 
(connect failed)
2017-11-02 14:07:01,600 WARN  org.apache.flink.runtime.net.ConnectionUtils      
            - Could not connect to jobmanager/10.0.0.2:6123. Selecting a local 
address using heuristics.
2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - TaskManager will use hostname/address '00afd4130a94' (10.0.0.5) 
for communication.
2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Starting TaskManager
2017-11-02 14:07:01,601 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Starting TaskManager actor system at 00afd4130a94:0.
2017-11-02 14:07:01,947 INFO  akka.event.slf4j.Slf4jLogger                      
            - Slf4jLogger started
2017-11-02 14:07:01,978 INFO  Remoting                                          
            - Starting remoting
2017-11-02 14:07:02,168 INFO  Remoting                                          
            - Remoting started; listening on addresses 
:[akka.tcp://flink@00afd4130a94:33881]
2017-11-02 14:07:02,174 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Starting TaskManager actor
2017-11-02 14:07:02,192 INFO  
org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig 
[server address: 00afd4130a94/10.0.0.5, server port: 0, ssl enabled: false, 
memory segment size (bytes): 32768, transport type: NIO, number of server 
threads: 2 (manual), number of client threads: 2 (manual), server connect 
backlog: 0 (use Netty's default), client connect timeout (sec): 120, 
send/receive buffer size (bytes): 0 (use Netty's default)]
2017-11-02 14:07:02,199 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have 
a max timeout of 10000 ms
2017-11-02 14:07:02,201 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary file 
directory '/tmp': total 29 GB, usable 25 GB (86.21% usable)
2017-11-02 14:07:02,286 INFO  
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 101 
MB for network buffer pool (number of memory segments: 3260, bytes per segment: 
32768).
2017-11-02 14:07:02,393 INFO  
org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting the 
network environment and its components.
2017-11-02 14:07:02,400 INFO  
org.apache.flink.runtime.io.network.netty.NettyClient         - Successful 
initialization (took 2 ms).
2017-11-02 14:07:02,434 INFO  
org.apache.flink.runtime.io.network.netty.NettyServer         - Successful 
initialization (took 32 ms). Listening on SocketAddress /10.0.0.5:42921.
2017-11-02 14:07:02,493 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting 
managed memory to 0.7 of the currently free heap space (640 MB), memory will be 
allocated lazily.
2017-11-02 14:07:02,498 INFO  
org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager 
uses directory /tmp/flink-io-e57d51fa-2269-4df0-9910-0fe26c6042bd for spill 
files.
2017-11-02 14:07:02,501 INFO  org.apache.flink.runtime.metrics.MetricRegistry   
            - No metrics reporter configured, no metrics will be 
exposed/reported.
2017-11-02 14:07:02,553 INFO  org.apache.flink.runtime.filecache.FileCache      
            - User file cache uses directory 
/tmp/flink-dist-cache-2c0c063f-464e-48f1-9fb8-fcfa48868e3a
2017-11-02 14:07:02,564 INFO  org.apache.flink.runtime.filecache.FileCache      
            - User file cache uses directory 
/tmp/flink-dist-cache-0c5e2b25-70a2-4964-9eec-24b0e79d560e
2017-11-02 14:07:02,572 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Starting TaskManager actor at 
akka://flink/user/taskmanager#1719715507.
2017-11-02 14:07:02,572 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - TaskManager data connection information: 
df5992297d269fa16a5e945e1dce0451 @ 00afd4130a94 (dataPort=42921)
2017-11-02 14:07:02,573 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - TaskManager has 2 task slot(s).
2017-11-02 14:07:02,574 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Memory usage stats: [HEAP: 113/1024/1024 MB, NON HEAP: 33/33/-1 
MB (used/committed/max)]
2017-11-02 14:07:02,576 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Trying to register at JobManager 
akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 
milliseconds)
2017-11-02 14:07:03,106 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Trying to register at JobManager 
akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 2, timeout: 1000 
milliseconds)
2017-11-02 14:07:04,126 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Trying to register at JobManager 
akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 3, timeout: 2000 
milliseconds){quote}


> Docker-Flink: TaskManagers can't find JobManager when in different nodes in 
> Docker Swarm
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-7965
>                 URL: https://issues.apache.org/jira/browse/FLINK-7965
>             Project: Flink
>          Issue Type: Bug
>          Components: Docker
>    Affects Versions: 1.3.2
>         Environment: node: ubuntu-swarm-master
> Azure VM Standard D4s v3 (4 vcpus, 16 GB memory)
> Docker version 17.03.1-ce, build c6d412e
> node: azure-swarm-worker-1
> Azure VM Standard D2 v2 Promo (2 vcpus, 7 GB memory)
> Docker version 17.09.0-ce, build afdb6d4
> Flink: using image 1.3.2-hadoop2-scala_2.10
>            Reporter: Thalita Vergilio
>            Priority: Major
>
> This happens even when the nodes are in the same subnet.
> I am using the Docker-Flink project in: 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> I am creating the services with the following commands: 
> {quote}docker network create -d overlay overlay 
> docker service create --name jobmanager --env 
> JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay 
> --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager 
> docker service create --name taskmanager --env 
> JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint 
> 'node.hostname != ubuntu-swarm-manager' flink taskmanager {quote}
> I wonder if there's any configuration I'm missing. This is the error I get: 
> {quote} Trying to register at JobManager akka.tcp://flink@jobmanager:6123/   
> user/jobmanager (attempt 4, timeout: 4000 milliseconds) {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to