[ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490624#comment-17490624
 ] 

Liu commented on FLINK-25992:
-----------------------------

Thanks [~roman] . I have check the code and the log. When the job restores from 
the checkpoint 2, the method reportRestoredCheckpoint in CheckpointStatsTracker 
is called . So there is update in progress and the member dirty is set true. In 
method createSnapshot, only when statsReadWriteLock.tryLock() is false then 
snapshot will not be updated. Additionally, the method createSnapshot is also 
triggered when awaiting job status in line 138  of JobDispatcherITCase. \

Based on the above info, it is hard to get the root reason. Your suggestion is 
valuable. But I am not sure wether it can resolve the problem thoroughly. What 
do you think?

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25992
>                 URL: https://issues.apache.org/jira/browse/FLINK-25992
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Tests
>    Affects Versions: 1.15.0
>            Reporter: Roman Khachatryan
>            Priority: Major
>              Labels: test-stability
>             Fix For: 1.15.0
>
>         Attachments: mvn-2.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task                    [] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>       at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>       at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>       at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>       at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>       at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>       at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>       at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>       at akka.actor.Actor.aroundReceive(Actor.scala:537)
>       at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>       at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>       at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>       at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>       at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>       at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>       at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>       at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>       ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to