[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488285#comment-17488285 ]
Roman Khachatryan edited comment on FLINK-23240 at 2/7/22, 6:08 PM: -------------------------------------------------------------------- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30857&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=24001 This is 1.15. The test didn't time out but hang up, and then the whole build timed out. {code} 16:29:42,407 [ pool-15-thread-1] WARN org.apache.flink.runtime.minicluster.MiniCluster [] - Error in MiniCluster. Shutting the MiniCluster down. org.apache.flink.util.FlinkException: Unexpected termination of ResourceManagerService. at org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent.lambda$handleUnexpectedResourceManagerTermination$0(DispatcherResourceManagerComponent.java:104) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_292] at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389) ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.util.concurrent.FutureUtils.lambda$forwardTo$24(FutureUtils.java:1372) ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153) ~[?:1.8.0_292] at org.apache.flink.util.concurrent.FutureUtils.forward(FutureUtils.java:1342) ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl.closeAsync(ResourceManagerServiceImpl.java:165) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl.lambda$revokeLeadership$2(ResourceManagerServiceImpl.java:221) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] {code} was (Author: roman_khachatryan): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30857&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=24001 This is 1.15. The test didn't time out but hang up, and then the whole build timed out: > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > ----------------------------------------------------------------------------------------------------- > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.14.0 > Reporter: Xintong Song > Priority: Major > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jul 04 22:17:29 at org.junit.runners.Suite.runChild(Suite.java:128) > Jul 04 22:17:29 at org.junit.runners.Suite.runChild(Suite.java:27) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75) > Jul 04 22:17:29 at > org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158) > Jul 04 22:17:29 at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > Jul 04 22:17:29 at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > Jul 04 22:17:29 at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > Jul 04 22:17:29 at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Jul 04 22:17:29 Caused by: java.util.concurrent.TimeoutException: Invocation > of public abstract java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at com.sun.proxy.$Proxy30.cancelJob(Unknown Source) > Jul 04 22:17:29 at > org.apache.flink.runtime.minicluster.MiniCluster.lambda$cancelJob$7(MiniCluster.java:716) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:680) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:658) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2094) > Jul 04 22:17:29 at > org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:758) > Jul 04 22:17:29 at > org.apache.flink.runtime.minicluster.MiniCluster.cancelJob(MiniCluster.java:715) > Jul 04 22:17:29 at > org.apache.flink.client.program.MiniClusterClient.cancel(MiniClusterClient.java:83) > Jul 04 22:17:29 ... 46 more > Jul 04 22:17:29 Caused by: akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/rpc/dispatcher_2#-1806874751]] after [10000 ms]. > Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A > typical reason for `AskTimeoutException` is that the recipient actor didn't > send a reply. > Jul 04 22:17:29 at > akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) > Jul 04 22:17:29 at > akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) > Jul 04 22:17:29 at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648) > Jul 04 22:17:29 at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) > Jul 04 22:17:29 at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > Jul 04 22:17:29 at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > Jul 04 22:17:29 at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > Jul 04 22:17:29 at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328) > Jul 04 22:17:29 at > akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279) > Jul 04 22:17:29 at > akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283) > Jul 04 22:17:29 at > akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235) > Jul 04 22:17:29 at java.base/java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)