[ https://issues.apache.org/jira/browse/KAFKA-14072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mickael Maison resolved KAFKA-14072. ------------------------------------ Fix Version/s: 3.5.0 Resolution: Fixed This looks like it's the same issue as KAFKA-14545 > Crashed MirrorCheckpointConnector appears as running in REST API > ---------------------------------------------------------------- > > Key: KAFKA-14072 > URL: https://issues.apache.org/jira/browse/KAFKA-14072 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, mirrormaker > Affects Versions: 3.1.0 > Reporter: Mickael Maison > Priority: Major > Fix For: 3.5.0 > > > In one cluster I had a partially crashed MirrorCheckpointConnector instance. > It had stopped mirroring offsets and emitting metrics completely but the > connector and its single task were still reporting as running in the REST API. > Looking at the logs, I found this stacktrace: > {code:java} > java.lang.NullPointerException > at > org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpoint(MirrorCheckpointTask.java:187) > at > org.apache.kafka.connect.mirror.MirrorCheckpointTask.lambda$checkpointsForGroup$2(MirrorCheckpointTask.java:171) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > at > java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) > at > java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1764) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > at > org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpointsForGroup(MirrorCheckpointTask.java:173) > at > org.apache.kafka.connect.mirror.MirrorCheckpointTask.sourceRecordsForGroup(MirrorCheckpointTask.java:157) > at > org.apache.kafka.connect.mirror.MirrorCheckpointTask.poll(MirrorCheckpointTask.java:139) > at > org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:291) > at > org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248) > at > org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186) > at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > WARN [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] Failure > polling consumer state for checkpoints. > (org.apache.kafka.connect.mirror.MirrorCheckpointTask) > [task-thread-prod-source->sc-prod-target.MirrorCheckpointConnector-0] > {code} > Not sure if it's related but prior this exception, there's quite a lot of: > {code:java} > ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] > WorkerSourceTask{id=prod-source->sc-prod-target.MirrorCheckpointConnector-0} > failed to send record to prod-source.checkpoints.internal: > (org.apache.kafka.connect.runtime.WorkerSourceTask) > [kafka-producer-network-thread | > connector-producer-prod-source->sc-prod-target.MirrorCheckpointConnector-0] > org.apache.kafka.common.KafkaException: Producer is closed forcefully. > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:760) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:747) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:283) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > and some users had started consumers in the target cluster hence causing > these log lines: > {code:java} > ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] > [AdminClient clientId=adminclient-137] OffsetCommit request for group id > <GROUP_ID> and partition <TP> failed due to unexpected error > UNKNOWN_MEMBER_ID. > (org.apache.kafka.clients.admin.internals.AlterConsumerGroupOffsetsHandler) > [kafka-admin-client-thread | adminclient-137] > {code} > Unfortunately I don't have the full history, so it's unclear if this happened > while stopping but the connector stayed in this state for several hours until > it was explicitly deleted via the REST API. -- This message was sent by Atlassian Jira (v8.20.10#820010)