[ 
https://issues.apache.org/jira/browse/FLINK-24348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463451#comment-17463451
 ] 

Martijn Visser commented on FLINK-24348:
----------------------------------------

I did an investigation and I suspect that it's Zookeeper that's causing the 
issues. This is what's in the log for the failed run:


{code:bash}
===> Running preflight checks ... 
===> Launching ... 
===> Launching kafka ... 
[2021-12-15 10:21:42,557] INFO Registered kafka:type=kafka.Log4jController 
MBean (kafka.utils.Log4jControllerRegistration$)
[2021-12-15 10:21:43,120] INFO starting (kafka.server.KafkaServer)
[2021-12-15 10:21:43,120] INFO Connecting to zookeeper on localhost:2181 
(kafka.server.KafkaServer)
[2021-12-15 10:21:43,142] INFO [ZooKeeperClient Kafka server] Initializing a 
new session to localhost:2181. (kafka.zookeeper.ZooKeeperClient)
[2021-12-15 10:21:43,167] INFO [ZooKeeperClient Kafka server] Waiting until 
connected. (kafka.zookeeper.ZooKeeperClient)
[2021-12-15 10:21:43,195] INFO Creating new log file: log.1 
(org.apache.zookeeper.server.persistence.FileTxnLog)
[2021-12-15 10:22:01,168] INFO [ZooKeeperClient Kafka server] Closing. 
(kafka.zookeeper.ZooKeeperClient)
[2021-12-15 10:22:01,289] INFO [ZooKeeperClient Kafka server] Closed. 
(kafka.zookeeper.ZooKeeperClient)
[2021-12-15 10:22:01,291] ERROR Fatal error during KafkaServer startup. Prepare 
to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
connection while in state: CONNECTING
        at 
kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:271)
        at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:125)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1948)
        at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:431)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:456)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:191)
        at kafka.Kafka$.main(Kafka.scala:109)
        at kafka.Kafka.main(Kafka.scala)
[2021-12-15 10:22:01,294] INFO shutting down (kafka.server.KafkaServer)
[2021-12-15 10:22:01,301] INFO shut down completed (kafka.server.KafkaServer)
[2021-12-15 10:22:01,302] ERROR Exiting Kafka. (kafka.Kafka$)
[2021-12-15 10:22:01,303] INFO shutting down (kafka.server.KafkaServer)
{code}

During successful runs, the logs look like:

{code:bash}
10:21:21,002 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: ===> 
Running preflight checks ... 
10:21:21,018 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: ===> 
Launching ... 
10:21:21,030 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: ===> 
Launching kafka ... 
10:21:21,800 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:21,796] INFO Registered kafka:type=kafka.Log4jController 
MBean (kafka.utils.Log4jControllerRegistration$)
10:21:22,305 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,304] INFO starting (kafka.server.KafkaServer)
10:21:22,305 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,305] INFO Connecting to zookeeper on localhost:2181 
(kafka.server.KafkaServer)
10:21:22,324 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,323] INFO [ZooKeeperClient Kafka server] Initializing a 
new session to localhost:2181. (kafka.zookeeper.ZooKeeperClient)
10:21:22,347 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,346] INFO [ZooKeeperClient Kafka server] Waiting until 
connected. (kafka.zookeeper.ZooKeeperClient)
10:21:22,377 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,376] INFO Creating new log file: log.1 
(org.apache.zookeeper.server.persistence.FileTxnLog)
10:21:22,417 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,411] INFO [ZooKeeperClient Kafka server] Connected. 
(kafka.zookeeper.ZooKeeperClient)
10:21:22,554 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,552] INFO [feature-zk-node-event-process-thread]: Starting 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
10:21:22,570 [docker-java-stream-1918861308] INFO  
org.apache.flink.connector.kafka.sink.KafkaSinkITCase        [] - STDOUT: 
[2021-12-15 10:21:22,570] INFO Feature ZK node at path: /feature does not exist 
(kafka.server.FinalizedFeatureChangeListener)
{code}

When digging a bit deeper through the logs, it appears that Zookeeper and/or 
Kafka runs into network issues and therefore shuts down. I'm leaning towards 
network hickups on the CI machines that cause these occurrences which we can't 
really resolve unless we decide to really investigate what's causing network 
connectivity on the CI machines. 

> Kafka ITCases (e.g. KafkaTableITCase) fail with "ContainerLaunch Container 
> startup failed"
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24348
>                 URL: https://issues.apache.org/jira/browse/FLINK-24348
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Martijn Visser
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.15.0, 1.14.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24338&view=logs&j=b0097207-033c-5d9a-b48c-6d4796fbe60d&t=8338a7d2-16f7-52e5-f576-4b7b3071eb3d&l=7140
> {code}
> Sep 21 02:44:33 org.testcontainers.containers.ContainerLaunchException: 
> Container startup failed
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:334)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.KafkaContainer.doStart(KafkaContainer.java:97)
> Sep 21 02:44:33       at 
> org.apache.flink.streaming.connectors.kafka.table.KafkaTableTestBase$1.doStart(KafkaTableTestBase.java:71)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:315)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1060)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
> Sep 21 02:44:33       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Sep 21 02:44:33       at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Sep 21 02:44:33       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Sep 21 02:44:33       at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Sep 21 02:44:33       at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Sep 21 02:44:33       at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> Sep 21 02:44:33       at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> Sep 21 02:44:33       at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> Sep 21 02:44:33       at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> Sep 21 02:44:33       at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> Sep 21 02:44:33       at 
> java.util.Iterator.forEachRemaining(Iterator.java:116)
> Sep 21 02:44:33       at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> Sep 21 02:44:33       at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> Sep 21 02:44:33       at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> Sep 21 02:44:33       at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> Sep 21 02:44:33       at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> Sep 21 02:44:33       at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> Sep 21 02:44:33       at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> Sep 21 02:44:33       at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
> Sep 21 02:44:33       at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
> Sep 21 02:44:33       at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
> Sep 21 02:44:33       at 
> org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
> Sep 21 02:44:33       at 
> org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
> Sep 21 02:44:33       at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
> Sep 21 02:44:33       at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:150)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:120)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> Sep 21 02:44:33       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Sep 21 02:44:33 Caused by: org.rnorth.ducttape.RetryCountExceededException: 
> Retry limit hit with exception
> Sep 21 02:44:33       at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:327)
> Sep 21 02:44:33       ... 36 more
> Sep 21 02:44:33 Caused by: 
> org.testcontainers.containers.ContainerLaunchException: Could not 
> create/start container
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:523)
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:329)
> Sep 21 02:44:33       at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
> Sep 21 02:44:33       ... 37 more
> Sep 21 02:44:33 Caused by: java.lang.IllegalStateException: Container did not 
> start correctly.
> Sep 21 02:44:33       at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:461)
> Sep 21 02:44:33       ... 39 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to