[ https://issues.apache.org/jira/browse/FLINK-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann resolved FLINK-8887. ---------------------------------- Resolution: Fixed Fixed via master: c0dddc1a6e3eef8ded1963e61ee4a8f8ecf66475 1.5.0: 11548456d52c22e1e31fea122de106b1b76a0618 > ClusterClient.getJobStatus can throw FencingTokenException > ---------------------------------------------------------- > > Key: FLINK-8887 > URL: https://issues.apache.org/jira/browse/FLINK-8887 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Affects Versions: 1.5.0 > Reporter: Gary Yao > Assignee: Till Rohrmann > Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > *Description* > Calling {{RestClusterClient.getJobStatus}} or > {{MiniClusterClient.getJobStatus}} can result in a {{FencingTokenException}}. > *Analysis* > {{Dispatcher.requestJobStatus}} first looks the {{JobManagerRunner}} up by > job id. If a reference is found, {{requestJobStatus}} is called on the > respective instance. If not, the {{ArchivedExecutionGraphStore}} is queried. > However, between the lookup and the method call, the {{JobMaster}} of the > respective job may have lost leadership already (job finished), and has set > the fencing token to {{null}}. > *Stacktrace* > {noformat} > Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: > Fencing token mismatch: Ignoring message LocalFencedMessage(null, > LocalRpcInvocation(requestJobStatus(Time))) because the fencing token null > did not match the expected fencing token b8423c75bc6838244b8c93c8bd4a4f51. > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:73) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132) > at > akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {noformat} > {noformat} > Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: > Fencing token not set: Ignoring message LocalFencedMessage(null, > LocalRpcInvocation(requestJobStatus(Time))) because the fencing token is null. > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:56) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132) > at > akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)