[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810799#comment-17810799
 ] 

Matthias Pohl commented on FLINK-34206:
---------------------------------------

I'm not able to pinpoint the cause of this test failure. The test itself wasn't 
touched for a while. The test is based on the AdaptiveBatchScheduler which 
hasn't been touched for some time either. I'm raising the priority to blocker 
for now until the cause of this issue is identified. [~lincoln] [~yunta] 
[~jingge] [~martijnvisser] Any idea who we could ping?

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> -----------------------------------------------------------
>
>                 Key: FLINK-34206
>                 URL: https://issues.apache.org/jira/browse/FLINK-34206
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.19.0
>            Reporter: Matthias Pohl
>            Priority: Blocker
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728&view=logs&j=a657ddbf-d986-5381-9649-342d9c92e7fb&t=dc085d4a-05c8-580e-06ab-21f5624dab16&l=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48       at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48       at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48       at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48       at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48       at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48       at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48       at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 23 01:39:48       at 
> org.apache.pekko.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:128)
> Jan 23 01:39:48       at 
> org.apache.pekko.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:1154)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
> Jan 23 01:39:48       at 
> org.apache.pekko.remote.EndpointActor.aroundReceive(Endpoint.scala:550)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241)
> Jan 23 01:39:48       at 
> org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
> Jan 23 01:39:48 Caused by: 
> org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
> failure. This suppresses job restarts. Please check the stack trace for the 
> root cause.
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.lambda$tryComputeSourceParallelismThenRunAsync$7(AdaptiveBatchScheduler.java:285)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:844)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
> Jan 23 01:39:48       at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
> Jan 23 01:39:48       at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
> Jan 23 01:39:48       at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
> Jan 23 01:39:48       at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
> Jan 23 01:39:48       at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
> Jan 23 01:39:48       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
> Jan 23 01:39:48       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> Jan 23 01:39:48       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
> Jan 23 01:39:48       at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229)
> Jan 23 01:39:48       ... 10 more
> Jan 23 01:39:48 Caused by: java.util.concurrent.CompletionException: 
> java.lang.IllegalStateException: Expected execution 
> c2dc985dca4e7acdbcba039b20654a06_306d8342cb5b2ad8b53f1be57f65bee8_1_0 to be 
> in CREATED state, was: CANCELED
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48       ... 28 more
> Jan 23 01:39:48 Caused by: java.lang.IllegalStateException: Expected 
> execution 
> c2dc985dca4e7acdbcba039b20654a06_306d8342cb5b2ad8b53f1be57f65bee8_1_0 to be 
> in CREATED state, was: CANCELED
> Jan 23 01:39:48       at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.lambda$validateExecutionStates$0(DefaultExecutionDeployer.java:110)
> Jan 23 01:39:48       at 
> java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.validateExecutionStates(DefaultExecutionDeployer.java:108)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.allocateSlotsAndDeploy(DefaultExecutionDeployer.java:93)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.allocateSlotsAndDeploy(DefaultScheduler.java:475)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.allocateSlotsAndDeploy(AdaptiveBatchScheduler.java:237)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.strategy.VertexwiseSchedulingStrategy.lambda$scheduleVerticesOneByOne$3(VertexwiseSchedulingStrategy.java:207)
> Jan 23 01:39:48       at 
> java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.strategy.VertexwiseSchedulingStrategy.scheduleVerticesOneByOne(VertexwiseSchedulingStrategy.java:206)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.strategy.VertexwiseSchedulingStrategy.maybeScheduleVertices(VertexwiseSchedulingStrategy.java:145)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.strategy.VertexwiseSchedulingStrategy.onExecutionStateChange(VertexwiseSchedulingStrategy.java:114)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFinished(DefaultScheduler.java:251)
> Jan 23 01:39:48       at 
> org.apache.flink.runtime.scheduler.adaptivebatch.AdaptiveBatchScheduler.lambda$onTaskFinished$1(AdaptiveBatchScheduler.java:202)
> Jan 23 01:39:48       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48       ... 29 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to