Hi Till,
I'm getting following message in Jobmanager log

2016-09-09 07:46:55,093 PDT [WARN]  ip-10-8-11-249
[flink-akka.actor.default-dispatcher-985] akka.remote.RemoteWatcher - *Detected
unreachable: [akka.tcp://flink@
2016-09-09 07:46:55,094 PDT [INFO]  ip-10-8-11-249
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@ terminated.
2016-09-09 07:46:55,094 PDT [INFO]  ip-10-8-11-249
[flink-akka.actor.default-dispatcher-985] o.a.f.r.instance.InstanceManager
- Unregistered task manager akka.tcp://flink@
Number of registered task managers 2. Number of available slots 4.
2016-09-09 07:46:55,096 PDT [WARN]  ip-10-8-11-249
[flink-akka.actor.default-dispatcher-982] Remoting - Association to
[akka.tcp://flink@] having UID [-1223410403] is irrecoverably
failed. *UID is now quarantined and all messages to this UID will be
delivered to dead letters. Remote actorsystem must be restarted to recover
from this situation.*
2016-09-09 07:46:55,097 PDT [INFO]  ip-10-8-11-249
[flink-akka.actor.default-dispatcher-982] akka.actor.LocalActorRef -
Message [akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://flink/deadLetters] to
was not delivered. [54] dead letters encountered. This logging can be
turned off or adjusted with configuration settings 'akka.log-dead-letters'
and 'akka.log-dead-letters-during-shutdown'.
2016-09-09 07:46:55,098 PDT [INFO]  ip-10-8-11-249
[flink-akka.actor.default-dispatcher-985] akka.actor.LocalActorRef -
Message [akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://flink/deadLetters] to
was not delivered. [55] dead letters encountered. This logging can be
turned off or adjusted with configuration settings 'akka.log-dead-letters'
and 'akka.log-dead-letters-during-shutdown'.
2016-09-09 07:46:58,479 PDT [INFO]  ip-10-8-11-249
[ForkJoinPool-3-worker-1] o.a.f.r.c.ZooKeeperCompletedCheckpointStore -
Recovering checkpoints from ZooKeeper.

Hope it helps. I'm using Flink 1.0.2

On Fri, Sep 9, 2016 at 12:34 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Deepak,
> could you check the logs whether the JobManager has been quarantined and
> thus, cannot be connected to anymore? The logs should at least contain a
> hint why the TaskManager lost the connection initially.
> Cheers,
> Till
> On Thu, Sep 8, 2016 at 7:08 PM, Deepak Jha <dkjhan...@gmail.com> wrote:
> > Hi,
> > I've setup Flink HA on AWS ( 3 Taskmanagers and 2 Jobmanagers each are on
> > EC2 m4.large instance with checkpoint enabled on S3 ). My topology works
> > fine, but after few hours I do see that Taskmanagers gets detached with
> > Jobmanager. I tried to reach Jobmanager using telnet at the same time and
> > it worked but Taskmanager does not succeed in connecting again. It
> attaches
> > only after I restart it. I tried following settings but still the problem
> > persists.
> >
> > akka.ask.timeout: 20 s
> > akka.lookup.timeout: 20 s
> > akka.watch.heartbeat.interval: 20 s
> >
> > Please find attached snapshot on one of the Taskmanager. Is there any
> > setting that I need to do ?
> >
> > --
> > Thanks,
> > Deepak Jha
> >
> >

Deepak Jha

Reply via email to