[ https://issues.apache.org/jira/browse/FLINK-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946322#comment-16946322 ]
Steven Zhen Wu commented on FLINK-14316: ---------------------------------------- [~trohrmann] in the uploaded tar ball, there is one TM log file. the rest are JM log files. This is the line from TM log that TM thinks JM lost leadership. ``` 2019-10-06 16:11:36,471 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - JobManager for job 3bb42eb7602c5ba25740d8360b1f0e27 with leader id 9aa48c6a49d009f7fb287754b61d4af8 lost leadership. ``` > stuck in "Job leader ... lost leadership" error > ----------------------------------------------- > > Key: FLINK-14316 > URL: https://issues.apache.org/jira/browse/FLINK-14316 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.7.2 > Reporter: Steven Zhen Wu > Priority: Major > Attachments: FLINK-14316.tgz > > > This is the first exception caused restart loop. Later exceptions are the > same. Job seems to stuck in this permanent failure state. > {code} > 2019-10-03 21:42:46,159 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > clpevents -> device_filter -> processed_imps -> ios_processed_impression -> i > mps_ts_assigner (449/1360) (d237f5e99b6a4a580498821473763edb) switched from > SCHEDULED to FAILED. > java.lang.Exception: Job leader for job id ecb9ad9be934edf7b1a4f7b9dd6df365 > lost leadership. > at > org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1526) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)