[ 
https://issues.apache.org/jira/browse/FLINK-17768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109487#comment-17109487
 ] 

Piotr Nowojski commented on FLINK-17768:
----------------------------------------

I think the problem might be because of mishandling of trailing cancelation 
barriers. When I tried to reproduce the problem, I reproduced the failure for 
Checkpoint 20, shortly after following log entries were produced:
{noformat}
166935 [Sink: Unnamed (1/5)] WARN  
org.apache.flink.streaming.runtime.io.CheckpointBarrierUnaligner [] - Sink: 
Unnamed (1/5) (1a16b0a5d4089828a7db77762b2dfa4a): Received cancellation barrier 
for checkpoint 19 before completing current checkpoint 20. Skipping current 
checkpoint.
{noformat}


> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnLocalAndRemoteChannel
>  is instable
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-17768
>                 URL: https://issues.apache.org/jira/browse/FLINK-17768
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>            Reporter: Dian Fu
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.11.0
>
>
> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnLocalAndRemoteChannel
>  and shouldPerformUnalignedCheckpointOnParallelRemoteChannel failed in azure:
> {code}
> 2020-05-16T12:41:32.3546620Z [ERROR] 
> shouldPerformUnalignedCheckpointOnLocalAndRemoteChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
>   Time elapsed: 18.865 s  <<< ERROR!
> 2020-05-16T12:41:32.3548739Z java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-05-16T12:41:32.3550177Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-05-16T12:41:32.3551416Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-05-16T12:41:32.3552959Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1665)
> 2020-05-16T12:41:32.3554979Z  at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
> 2020-05-16T12:41:32.3556584Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1645)
> 2020-05-16T12:41:32.3558068Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1627)
> 2020-05-16T12:41:32.3559431Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:158)
> 2020-05-16T12:41:32.3560954Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnLocalAndRemoteChannel(UnalignedCheckpointITCase.java:145)
> 2020-05-16T12:41:32.3562203Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-16T12:41:32.3563433Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-16T12:41:32.3564846Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-16T12:41:32.3565894Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-16T12:41:32.3566870Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-16T12:41:32.3568064Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-16T12:41:32.3569727Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-16T12:41:32.3570818Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-16T12:41:32.3571840Z  at 
> org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
> 2020-05-16T12:41:32.3572771Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-05-16T12:41:32.3574008Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-16T12:41:32.3575406Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-16T12:41:32.3576476Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-16T12:41:32.3577253Z  at java.lang.Thread.run(Thread.java:748)
> 2020-05-16T12:41:32.3578228Z Caused by: 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-05-16T12:41:32.3579520Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-05-16T12:41:32.3580935Z  at 
> org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:186)
> 2020-05-16T12:41:32.3582361Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-05-16T12:41:32.3583456Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-05-16T12:41:32.3584816Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-05-16T12:41:32.3585874Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-05-16T12:41:32.3587059Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-05-16T12:41:32.3588572Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-05-16T12:41:32.3589733Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-05-16T12:41:32.3590860Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-05-16T12:41:32.3591956Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-05-16T12:41:32.3593042Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:890)
> 2020-05-16T12:41:32.3594105Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-05-16T12:41:32.3595084Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-05-16T12:41:32.3595937Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-05-16T12:41:32.3596828Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-05-16T12:41:32.3597800Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-05-16T12:41:32.3598856Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-05-16T12:41:32.3600084Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-05-16T12:41:32.3601108Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-05-16T12:41:32.3602249Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-05-16T12:41:32.3603396Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-05-16T12:41:32.3605032Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-05-16T12:41:32.3606307Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-05-16T12:41:32.3607287Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-05-16T12:41:32.3608294Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-05-16T12:41:32.3609322Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-05-16T12:41:32.3610521Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3611745Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3612950Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3614288Z  at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2020-05-16T12:41:32.3615488Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
> 2020-05-16T12:41:32.3616491Z  at 
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2020-05-16T12:41:32.3617683Z  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2020-05-16T12:41:32.3618815Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-05-16T12:41:32.3619806Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-05-16T12:41:32.3621104Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-05-16T12:41:32.3622154Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-05-16T12:41:32.3623458Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=5, 
> backoffTimeMS=100)
> 2020-05-16T12:41:32.3625232Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:112)
> 2020-05-16T12:41:32.3626779Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
> 2020-05-16T12:41:32.3628274Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:189)
> 2020-05-16T12:41:32.3629528Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:183)
> 2020-05-16T12:41:32.3630831Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:177)
> 2020-05-16T12:41:32.3632245Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:505)
> 2020-05-16T12:41:32.3633438Z  at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386)
> 2020-05-16T12:41:32.3634641Z  at 
> sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> 2020-05-16T12:41:32.3635628Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-16T12:41:32.3636595Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-16T12:41:32.3637729Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> 2020-05-16T12:41:32.3638973Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> 2020-05-16T12:41:32.3640199Z  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> 2020-05-16T12:41:32.3641385Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> 2020-05-16T12:41:32.3642395Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2020-05-16T12:41:32.3643322Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2020-05-16T12:41:32.3644357Z  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2020-05-16T12:41:32.3645466Z  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2020-05-16T12:41:32.3646487Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2020-05-16T12:41:32.3647729Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-16T12:41:32.3648715Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-16T12:41:32.3649611Z  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2020-05-16T12:41:32.3650511Z  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2020-05-16T12:41:32.3651410Z  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2020-05-16T12:41:32.3652257Z  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2020-05-16T12:41:32.3653213Z  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2020-05-16T12:41:32.3654166Z  at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2020-05-16T12:41:32.3655121Z  at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2020-05-16T12:41:32.3655842Z  ... 4 more
> 2020-05-16T12:41:32.3656578Z Caused by: java.io.IOException: Could not 
> perform checkpoint 20 for operator Map (1/5).
> 2020-05-16T12:41:32.3657800Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:857)
> 2020-05-16T12:41:32.3659129Z  at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:107)
> 2020-05-16T12:41:32.3660674Z  at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierUnaligner$ThreadSafeUnaligner.lambda$notifyBarrierReceived$0(CheckpointBarrierUnaligner.java:309)
> 2020-05-16T12:41:32.3662369Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> 2020-05-16T12:41:32.3663713Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
> 2020-05-16T12:41:32.3665348Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:285)
> 2020-05-16T12:41:32.3666971Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:205)
> 2020-05-16T12:41:32.3668665Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
> 2020-05-16T12:41:32.3670150Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:553)
> 2020-05-16T12:41:32.3671277Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
> 2020-05-16T12:41:32.3672268Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:713)
> 2020-05-16T12:41:32.3673197Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> 2020-05-16T12:41:32.3674129Z  at java.lang.Thread.run(Thread.java:748)
> 2020-05-16T12:41:32.3675215Z Caused by: java.lang.IllegalArgumentException: 
> channel state write result not found for checkpoint id 20
> 2020-05-16T12:41:32.3676320Z  at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
> 2020-05-16T12:41:32.3677583Z  at 
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriterImpl.getWriteResult(ChannelStateWriterImpl.java:143)
> 2020-05-16T12:41:32.3679064Z  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:266)
> 2020-05-16T12:41:32.3680598Z  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:164)
> 2020-05-16T12:41:32.3682003Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:886)
> 2020-05-16T12:41:32.3683320Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> 2020-05-16T12:41:32.3684798Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:876)
> 2020-05-16T12:41:32.3686020Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:844)
> 2020-05-16T12:41:32.3686848Z  ... 12 more
> 2020-05-16T12:41:32.3687161Z 
> 2020-05-16T12:41:32.3688455Z [ERROR] 
> shouldPerformUnalignedCheckpointOnParallelRemoteChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
>   Time elapsed: 9.535 s  <<< ERROR!
> 2020-05-16T12:41:32.3690013Z java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-05-16T12:41:32.3691222Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-05-16T12:41:32.3692239Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-05-16T12:41:32.3693445Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1665)
> 2020-05-16T12:41:32.3695048Z  at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
> 2020-05-16T12:41:32.3696464Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1645)
> 2020-05-16T12:41:32.3698062Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1627)
> 2020-05-16T12:41:32.3699413Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:158)
> 2020-05-16T12:41:32.3701166Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel(UnalignedCheckpointITCase.java:140)
> 2020-05-16T12:41:32.3702419Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-16T12:41:32.3703350Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-16T12:41:32.3704969Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-16T12:41:32.3706119Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-16T12:41:32.3707211Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-16T12:41:32.3708559Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-16T12:41:32.3709677Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-16T12:41:32.3710773Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-16T12:41:32.3711753Z  at 
> org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
> 2020-05-16T12:41:32.3712785Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-05-16T12:41:32.3714049Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-16T12:41:32.3715748Z  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-16T12:41:32.3717001Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-16T12:41:32.3717893Z  at java.lang.Thread.run(Thread.java:748)
> 2020-05-16T12:41:32.3718795Z Caused by: 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-05-16T12:41:32.3719891Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-05-16T12:41:32.3721412Z  at 
> org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:186)
> 2020-05-16T12:41:32.3722853Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2020-05-16T12:41:32.3724059Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2020-05-16T12:41:32.3725342Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-05-16T12:41:32.3726408Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-05-16T12:41:32.3727763Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
> 2020-05-16T12:41:32.3728991Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2020-05-16T12:41:32.3730231Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2020-05-16T12:41:32.3731518Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2020-05-16T12:41:32.3732600Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2020-05-16T12:41:32.3733680Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:890)
> 2020-05-16T12:41:32.3734977Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2020-05-16T12:41:32.3735847Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2020-05-16T12:41:32.3736701Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2020-05-16T12:41:32.3737668Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2020-05-16T12:41:32.3738572Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-05-16T12:41:32.3739624Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> 2020-05-16T12:41:32.3740754Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2020-05-16T12:41:32.3741778Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2020-05-16T12:41:32.3742743Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2020-05-16T12:41:32.3743956Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2020-05-16T12:41:32.3745268Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2020-05-16T12:41:32.3746646Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2020-05-16T12:41:32.3747799Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2020-05-16T12:41:32.3748747Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2020-05-16T12:41:32.3749750Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2020-05-16T12:41:32.3751177Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3752463Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3753651Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2020-05-16T12:41:32.3755004Z  at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2020-05-16T12:41:32.3756026Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
> 2020-05-16T12:41:32.3757013Z  at 
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2020-05-16T12:41:32.3758198Z  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2020-05-16T12:41:32.3759324Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-05-16T12:41:32.3760334Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-05-16T12:41:32.3761343Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-05-16T12:41:32.3762370Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-05-16T12:41:32.3763673Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=5, 
> backoffTimeMS=100)
> 2020-05-16T12:41:32.3765419Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:112)
> 2020-05-16T12:41:32.3766974Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
> 2020-05-16T12:41:32.3768440Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:189)
> 2020-05-16T12:41:32.3769691Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:183)
> 2020-05-16T12:41:32.3770992Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:177)
> 2020-05-16T12:41:32.3772437Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:505)
> 2020-05-16T12:41:32.3773646Z  at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386)
> 2020-05-16T12:41:32.3774841Z  at 
> sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> 2020-05-16T12:41:32.3775836Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-16T12:41:32.3776796Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-16T12:41:32.3777872Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> 2020-05-16T12:41:32.3779026Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> 2020-05-16T12:41:32.3780203Z  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> 2020-05-16T12:41:32.3781384Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> 2020-05-16T12:41:32.3782447Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2020-05-16T12:41:32.3783577Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2020-05-16T12:41:32.3784987Z  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2020-05-16T12:41:32.3786104Z  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2020-05-16T12:41:32.3787215Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2020-05-16T12:41:32.3788554Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-16T12:41:32.3789655Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-16T12:41:32.3790672Z  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2020-05-16T12:41:32.3791705Z  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2020-05-16T12:41:32.3792746Z  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2020-05-16T12:41:32.3793686Z  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2020-05-16T12:41:32.3794857Z  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2020-05-16T12:41:32.3795796Z  at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2020-05-16T12:41:32.3796678Z  at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2020-05-16T12:41:32.3797348Z  ... 4 more
> 2020-05-16T12:41:32.3798370Z Caused by: java.io.IOException: Could not 
> perform checkpoint 18 for operator Sink: Unnamed (4/5).
> 2020-05-16T12:41:32.3799764Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:857)
> 2020-05-16T12:41:32.3801253Z  at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:107)
> 2020-05-16T12:41:32.3803019Z  at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierUnaligner$ThreadSafeUnaligner.lambda$notifyBarrierReceived$0(CheckpointBarrierUnaligner.java:309)
> 2020-05-16T12:41:32.3804981Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> 2020-05-16T12:41:32.3806356Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
> 2020-05-16T12:41:32.3807744Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:285)
> 2020-05-16T12:41:32.3809218Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:205)
> 2020-05-16T12:41:32.3810719Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
> 2020-05-16T12:41:32.3812119Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:553)
> 2020-05-16T12:41:32.3813389Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
> 2020-05-16T12:41:32.3814763Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:713)
> 2020-05-16T12:41:32.3815805Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
> 2020-05-16T12:41:32.3816729Z  at java.lang.Thread.run(Thread.java:748)
> 2020-05-16T12:41:32.3818064Z Caused by: java.lang.IllegalArgumentException: 
> channel state write result not found for checkpoint id 18
> 2020-05-16T12:41:32.3819318Z  at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
> 2020-05-16T12:41:32.3820670Z  at 
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriterImpl.getWriteResult(ChannelStateWriterImpl.java:143)
> 2020-05-16T12:41:32.3822364Z  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.finishAndReportAsync(SubtaskCheckpointCoordinatorImpl.java:233)
> 2020-05-16T12:41:32.3824252Z  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:165)
> 2020-05-16T12:41:32.3825917Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:886)
> 2020-05-16T12:41:32.3827421Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> 2020-05-16T12:41:32.3828959Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:876)
> 2020-05-16T12:41:32.3830320Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:844)
> 2020-05-16T12:41:32.3831287Z  ... 12 more
> {code}
> instance: 
> https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/1538/logs/114



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to