[
https://issues.apache.org/jira/browse/KAFKA-19697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Yang reassigned KAFKA-19697:
--------------------------------
Assignee: Fan Yang
> NPE Cannot invoke
> org.apache.kafka.connect.runtime.ConnectMetrics$MetricGroup.close()
> -------------------------------------------------------------------------------------
>
> Key: KAFKA-19697
> URL: https://issues.apache.org/jira/browse/KAFKA-19697
> Project: Kafka
> Issue Type: Bug
> Components: connect
> Affects Versions: 4.0.0
> Environment: Kafka connect cluster with 20 workers running in
> kubernetes, on homebrewed kafka images built from
> eclipse-temurin:21-jre-alpine-3.21 and the official kafka 4.0.0 distribution.
> Brokers are version 3.9
> Reporter: Martin Andersson
> Assignee: Fan Yang
> Priority: Major
> Labels: connect, connect-worker
>
> Several tasks in multiple sink connectors in a long-running connect cluster
> broke spontaneously (within a couple of hours) with the following stacktrace:
> {code:java}
> java.lang.NullPointerException: Cannot invoke
> "org.apache.kafka.connect.runtime.ConnectMetrics$MetricGroup.close()" because
> the return value of "java.util.concurrent.ConcurrentMap.get(Object)" is null
> at
> org.apache.kafka.connect.runtime.Worker$ConnectorStatusMetricsGroup.recordTaskRemoved(Worker.java:2333)
> at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:707)
> at org.apache.kafka.connect.runtime.Worker.startSinkTask(Worker.java:568)
> at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:2009)
> at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$getTaskStartingCallable$39(DistributedHerder.java:2059)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source) {code}
> Restarting the failed tasks with the REST API lead to another task failure
> with the following stacktrace:
> {code:java}
> java.lang.NullPointerException: Cannot invoke "java.util.Map.size()" because
> "inputMap" is null
> at
> org.apache.kafka.common.utils.Utils.castToStringObjectMap(Utils.java:1476)
> at
> org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:112)
> at
> org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:146)
> at org.apache.kafka.connect.runtime.TaskConfig.<init>(TaskConfig.java:51)
> at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:661)
> at org.apache.kafka.connect.runtime.Worker.startSinkTask(Worker.java:568)
> at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:2009)
> at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$getTaskStartingCallable$39(DistributedHerder.java:2059)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> Likely related to KAFKA-17719 ?
> The failed tasks did not show up on the _connector-failed-task-count_ metric
> (or in the _restarting/paused/failed_ task metrics), but the failing tasks
> disappeared from the connector-running-task-count metric.
> Restarting the whole connect cluster resolved the issue.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)