George Yang created KAFKA-18386:
-----------------------------------
Summary: Mirror Maker2 Pod CrashLoopBackoff When one DC is powered
off
Key: KAFKA-18386
URL: https://issues.apache.org/jira/browse/KAFKA-18386
Project: Kafka
Issue Type: Bug
Components: mirrormaker
Affects Versions: 3.7.1
Reporter: George Yang
When using Kubernetes deployment with MirrorMaker v3.7.1 and deploying one
Kafka node in each data center (DC1 and DC2), if DC1 is powered off, DC2 will
encounter a CrashLoopBackOff error. This issue is different from the one
described in KAFKA-17784. Please find the report log below:
```log
[2025-01-01 08:05:53,432] WARN [AdminClient clientId=dc64->dc88] Connection to
node -1 (/192.168.2.88:13399) could not be established. Node may not be
available.
(org.apache.kafka.clients.NetworkClient:830)[kafka-admin-client-thread |
dc64->dc88]
[2025-01-01 08:05:55,652] INFO [AdminClient clientId=dc64->dc88] Metadata
update failed
(org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
| dc64->dc88]
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the
call. Call: fetchMetadata
[2025-01-01 08:05:55,653] INFO App info kafka.admin.client for dc64->dc88
unregistered
(org.apache.kafka.common.utils.AppInfoParser:88)[kafka-admin-client-thread |
dc64->dc88]
[2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Metadata
update failed
(org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
| dc64->dc88]
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the
call. Call: fetchMetadata
[2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Timed out 1
remaining operation(s) during close.
(org.apache.kafka.clients.admin.KafkaAdminClient:1450)[kafka-admin-client-thread
| dc64->dc88]
[2025-01-01 08:05:55,657] INFO Metrics scheduler closed
(org.apache.kafka.common.metrics.Metrics:684)[kafka-admin-client-thread |
dc64->dc88]
[2025-01-01 08:05:55,658] INFO Closing reporter
org.apache.kafka.common.metrics.JmxReporter
(org.apache.kafka.common.metrics.Metrics:688)[kafka-admin-client-thread |
dc64->dc88]
[2025-01-01 08:05:55,658] INFO Metrics reporters closed
(org.apache.kafka.common.metrics.Metrics:694)[kafka-admin-client-thread |
dc64->dc88]
[2025-01-01 08:05:55,658] ERROR Stopping due to error
(org.apache.kafka.connect.mirror.MirrorMaker:360)[main]
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and
describe Kafka cluster. Check worker's broker connection and security
properties.
at
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:305)
at
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:285)
at
org.apache.kafka.connect.runtime.WorkerConfig.kafkaClusterId(WorkerConfig.java:415)
at
org.apache.kafka.connect.mirror.MirrorMaker.addHerder(MirrorMaker.java:252)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:158)
at
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:170)
at
org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:174)
at
org.apache.kafka.connect.mirror.MirrorMaker.main(MirrorMaker.java:347)
Caused by: java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node
assignment. Call: listNodes
at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
at
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at
org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:299)
... 8 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
for a node assignment. Call: listNodes
[2025-01-01 08:05:55,687] INFO Stopped http_8083@6705fb02\{HTTP/1.1,
(http/1.1)}{0.0.0.0:8083}
(org.eclipse.jetty.server.AbstractConnector:383)[JettyShutdownThread]
```
The configuration of mirrormaker is:
```
clusters = dc64, dc88
dc64.bootstrap.servers = 192.168.2.64:13399
dc88.bootstrap.servers = 192.168.2.88:13399
dc64->dc88.enabled = true
dc64->dc88.topics = .*
dc88->dc64.enabled = true
dc88->dc64.topics = .*
replication.factor=1
tasks.max=6
emit.checkpoints.interval.seconds=5
dc64.producer.acks=all
dc64.producer.batch.size=50000
dc64.consumer.auto.offset.reset=latest
dc88.consumer.auto.offset.reset=latest
dc64.consumer.max.poll.interval.ms=20000
dc88.consumer.max.poll.interval.ms=20000
refresh.topics.enabled=true
refresh.topics.interval.seconds=5
refresh.groups.enabled=true
refresh.groups.interval.seconds=5
dedicated.mode.enable.internal.rest = true
dc64.scheduled.rebalance.max.delay.ms=20000
dc88.scheduled.rebalance.max.delay.ms=20000
checkpoints.topic.replication.factor=1
heartbeats.topic.replication.factor=1
offset-syncs.topic.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
config.storage.replication.factor=1
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)