[ https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490624#comment-17490624 ]
Liu commented on FLINK-25992: ----------------------------- Thanks [~roman] . I have check the code and the log. When the job restores from the checkpoint 2, the method reportRestoredCheckpoint in CheckpointStatsTracker is called . So there is update in progress and the member dirty is set true. In method createSnapshot, only when statsReadWriteLock.tryLock() is false then snapshot will not be updated. Additionally, the method createSnapshot is also triggered when awaiting job status in line 138 of JobDispatcherITCase. \ Based on the above info, it is hard to get the root reason. Your suggestion is valuable. But I am not sure wether it can resolve the problem thoroughly. What do you think? > JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership > fails on azure > --------------------------------------------------------------------------------------------- > > Key: FLINK-25992 > URL: https://issues.apache.org/jira/browse/FLINK-25992 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests > Affects Versions: 1.15.0 > Reporter: Roman Khachatryan > Priority: Major > Labels: test-stability > Fix For: 1.15.0 > > Attachments: mvn-2.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154 > {code} > 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN > org.apache.flink.runtime.taskmanager.Task [] - jobVertex > (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED > with failure cause: java.lang.RuntimeException: Error while notify checkpoint > ABORT. > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457) > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > at akka.actor.Actor.aroundReceive(Actor.scala:537) > at akka.actor.Actor.aroundReceive$(Actor.scala:535) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) > at akka.actor.ActorCell.invoke(ActorCell.scala:548) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) > at akka.dispatch.Mailbox.run(Mailbox.scala:231) > at akka.dispatch.Mailbox.exec(Mailbox.scala:243) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Caused by: java.lang.UnsupportedOperationException: > notifyCheckpointAbortAsync not supported by > org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable > at > org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205) > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430) > ... 31 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)