Hi Deepak,

could you check the logs whether the JobManager has been quarantined and
thus, cannot be connected to anymore? The logs should at least contain a
hint why the TaskManager lost the connection initially.

Cheers,
Till

On Thu, Sep 8, 2016 at 7:08 PM, Deepak Jha <dkjhan...@gmail.com> wrote:

> Hi,
> I've setup Flink HA on AWS ( 3 Taskmanagers and 2 Jobmanagers each are on
> EC2 m4.large instance with checkpoint enabled on S3 ). My topology works
> fine, but after few hours I do see that Taskmanagers gets detached with
> Jobmanager. I tried to reach Jobmanager using telnet at the same time and
> it worked but Taskmanager does not succeed in connecting again. It attaches
> only after I restart it. I tried following settings but still the problem
> persists.
>
> akka.ask.timeout: 20 s
> akka.lookup.timeout: 20 s
> akka.watch.heartbeat.interval: 20 s
>
> Please find attached snapshot on one of the Taskmanager. Is there any
> setting that I need to do ?
>
> --
> Thanks,
> Deepak Jha
>
>

Reply via email to