Hi Stephan,
I am able to figure out the issue... Here is my explanation..

As I've said, I'm trying to setup Flink HA cluster in docker containers
managed by Amazon ECS. I've a remote zookeeper cluster running in AWS.
There are few issues when we deploy it using docker

--- Flink uses *jobmanager.rpc.address *to bind as well as for storing it
in the zookeeper. Now this address could be the host_ipaddress or
running_container_ipaddress. If I set it to host_ipaddress then jobmanager
is not able to bind because this is not the container's ip address.  If I
use the container's ip address then it is able to bind, but when it pushes
its details to zookeeper , its container's ip address. So remote
taskmanager's are not able to discover it. Ideally  *jobmanager.rpc.address
*should be split into *jobmanager.bind.address (*to bind to jobmanager*) *and
*jobmanager.discovery.address* (to publish in zookeeper so that remote
taskmanager's can discover it)..

eg: Let's assume

EC2_Instance_Ip = 1.1.1.1
Container_Ip = 2.2.2.2 (This container is running in this EC2_Instance)
recovery.jobmanager.port = 3000
jobmanager.web.port = 8080
I mapped port 3000 on container to 3000 on host and 8080 on container to
8080 on host...

In flink-conf.yml assume
*Case 1*
     jobmanager.rpc.address = 2.2.2.2  (Container's Ip address)
     Now 2.2.2.2 will be written in zookeeper. So external taskmanager
would like to use this address to communicate with the jobmanager but it
will not be able to connect since 2.2.2.2 is not discoverable from outside
EC2 container.

*Case 2*
   jobmanager.rpc.address = 1.1.1.1  (EC2_Instance Ip address)
   Container does not know this address, so it will not be able to bind at
all.

As you can see we need 2 ip address... one for binding and another for
discovery.

---- In docker world we have to expose all the ports we want to use ( in
bridged network mode). By default the jobmanager uses random port number
for communication, since we do not know the port number in advance so we
set r*ecovery.jobmanager.port*  and exposed it in Dockerfile. Same is the
case with blob.server.port on taskmanager's.

Hope I clarified it, please let me know if you have any other question.

On Thu, Mar 10, 2016 at 10:47 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> Is it possible that the docker container config forbids to open ports?
> Flink will try to open some ports and needs the OS or container to permit
> that.
>
> Greetings,
> Stephan
>
>
> On Thu, Mar 10, 2016 at 6:27 PM, Deepak Jha <dkjhan...@gmail.com> wrote:
>
> > Hi Stephan,
> > I tried 0.10.2 as well still running into the same issue.
> >
> > On Thursday, March 10, 2016, Deepak Jha <dkjhan...@gmail.com> wrote:
> >
> > > Yes. Flink 1.0.0
> > >
> > > On Thursday, March 10, 2016, Stephan Ewen <se...@apache.org
> > > <javascript:_e(%7B%7D,'cvml','se...@apache.org');>> wrote:
> > >
> > >> Hi!
> > >>
> > >> Is this Flink 1.0.0 ?
> > >>
> > >> Stephan
> > >>
> > >>
> > >> On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dkjhan...@gmail.com>
> > wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > I'm trying to setup Flink 1.0.0 cluster on Docker (separate
> containers
> > >> for
> > >> > jobmanager and taskmanager) inside AWS (Using AWS ECS service). I
> > >> tested it
> > >> > locally and its working fine but on AWS Docker, I am running into
> > >> following
> > >> > issue
> > >> >
> > >> > *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
> > >> > high-availability*
> > >> > *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
> > >> > 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode
> > >> CLUSTER*
> > >> > *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Security is not enabled.
> > Starting
> > >> > non-authenticated JobManager.*
> > >> > *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
> > >> > *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > org.apache.flink.util.NetUtils - Unable to allocate socket on port*
> > >> > *java.net.BindException: Cannot assign requested address*
> > >> > *    at java.net.PlainSocketImpl.socketBind(Native Method)*
> > >> > *    at
> > >> >
> > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
> > >> > *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
> > >> > *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
> > >> > *    at
> > >> >
> > org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > >> > *    at scala.util.Try$.apply(Try.scala:192)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > >> > *    at
> > >> >
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > >> > *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
> > >> > *java.lang.RuntimeException: Unable to do further retries starting
> the
> > >> > actor system*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > >> > *    at
> > >> >
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > >> > *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.h.m.lib.MutableMetricsFactory - field
> > >> > org.apache.hadoop.metrics2.lib.MutableRate
> > >> >
> > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> > >> > with annotation
> @org.apache.hadoop.metrics2.annotation.Metric(about=,
> > >> > sampleName=Ops, always=false, type=DEFAULT, value=[Rate of
> successful
> > >> > kerberos logins and latency (milliseconds)], valueName=Time)*
> > >> >
> > >> >
> > >> > Initially Jobmanager tries to bind to port 0 which did not work. On
> > >> > looking further into it, I tried using recovery jobmanager port
> using
> > >> > different port combinations, but it does not seems to be working...
> > I've
> > >> > exposed the ports in the docker compose file as well....
> > >> >
> > >> >
> > >> > PFA the jobmanager log file for details also the jobmanager config
> > >> file...
> > >> > --
> > >> > Thanks,
> > >> > Deepak Jha
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> > >
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>



-- 
Thanks,
Deepak Jha

Reply via email to