Greg Hogan created FLINK-4117: --------------------------------- Summary: Wait for CuratorFramework connection to be established Key: FLINK-4117 URL: https://issues.apache.org/jira/browse/FLINK-4117 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.1.0 Reporter: Greg Hogan
Received the following error when locally running {{mvn verify}}. Searching on the error it looks like we are not waiting for the Zookeeper connection to be established as this occurs asynchronously. In ZookeeperUtils.java:98 we call {{CuratorFramework.start()}} and we could then call {{{{CuratorFramework.blockUntilConnected}} with the same timeout. {code} Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 323.326 sec <<< FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase testConcurrentGetAndIncrement(org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase) Time elapsed: 266.521 sec <<< ERROR! java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest.testConcurrentGetAndIncrement(CheckpointIDCounterTest.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink/checkpoint-id-counter at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) at org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244) at org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:177) at org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:111) at org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter.getAndIncrement(ZooKeeperCheckpointIDCounter.java:121) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:201) at org.apache.flink.runtime.checkpoint.CheckpointIDCounterTest$Incrementer.call(CheckpointIDCounterTest.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 375.259 sec - in org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase Results : Tests in error: CheckpointIDCounterTest$ZooKeeperCheckpointIDCounterITCase>CheckpointIDCounterTest.testConcurrentGetAndIncrement:129 ยป Execution {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)