[ https://issues.apache.org/jira/browse/KAFKA-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alok Nikhil updated KAFKA-12345: -------------------------------- Description: Occasionally, a scheduler thread on a broker crashes with this stack {code:java} [2021-02-19 01:04:24,683] ERROR Uncaught exception in scheduled task 'send-alter-isr' (kafka.utils.KafkaScheduler) java.lang.NullPointerException at kafka.server.AlterIsrManagerImpl.sendRequest(AlterIsrManager.scala:117) at kafka.server.AlterIsrManagerImpl.propagateIsrChanges(AlterIsrManager.scala:85) at kafka.server.AlterIsrManagerImpl.$anonfun$start$1(AlterIsrManager.scala:66) at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834){code} After that the broker is unable to fetch any records from any other broker (and vice versa) {code:java} [2021-02-19 01:05:07,000] INFO [ReplicaFetcher replicaId=0, leaderId=4, fetcherId=0] Error sending fetch request (sessionId=164432409 2, epoch=957) to node 4: (org.apache.kafka.clients.FetchSessionHandler) java.io.IOException: Connection to 4 was disconnected before the response was read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:100) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:110) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:215) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:313) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:139) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:138) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:121) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96){code} was: Occasionally, a scheduler thread on a broker crashes with this stack ``` [2021-02-19 01:04:24,683] ERROR Uncaught exception in scheduled task 'send-alter-isr' (kafka.utils.KafkaScheduler) java.lang.NullPointerException at kafka.server.AlterIsrManagerImpl.sendRequest(AlterIsrManager.scala:117) at kafka.server.AlterIsrManagerImpl.propagateIsrChanges(AlterIsrManager.scala:85) at kafka.server.AlterIsrManagerImpl.$anonfun$start$1(AlterIsrManager.scala:66) at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` After that the broker is unable to fetch any records from any other broker (and vice versa) ``` [2021-02-19 01:05:07,000] INFO [ReplicaFetcher replicaId=0, leaderId=4, fetcherId=0] Error sending fetch request (sessionId=164432409 2, epoch=957) to node 4: (org.apache.kafka.clients.FetchSessionHandler) java.io.IOException: Connection to 4 was disconnected before the response was read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:100) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:110) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:215) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:313) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:139) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:138) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:121) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) ``` > KIP-500: AlterIsrManager crashes on broker idle-state > ----------------------------------------------------- > > Key: KAFKA-12345 > URL: https://issues.apache.org/jira/browse/KAFKA-12345 > Project: Kafka > Issue Type: Task > Components: core > Reporter: Alok Nikhil > Priority: Major > Labels: kip-500 > > Occasionally, a scheduler thread on a broker crashes with this stack > > {code:java} > [2021-02-19 01:04:24,683] ERROR Uncaught exception in scheduled task > 'send-alter-isr' (kafka.utils.KafkaScheduler) > java.lang.NullPointerException > at kafka.server.AlterIsrManagerImpl.sendRequest(AlterIsrManager.scala:117) > at > kafka.server.AlterIsrManagerImpl.propagateIsrChanges(AlterIsrManager.scala:85) > at > kafka.server.AlterIsrManagerImpl.$anonfun$start$1(AlterIsrManager.scala:66) > at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834){code} > > After that the broker is unable to fetch any records from any other broker > (and vice versa) > {code:java} > [2021-02-19 01:05:07,000] INFO [ReplicaFetcher replicaId=0, leaderId=4, > fetcherId=0] Error sending fetch request (sessionId=164432409 > 2, epoch=957) to node 4: (org.apache.kafka.clients.FetchSessionHandler) > java.io.IOException: Connection to 4 was disconnected before the response > was read > at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:100) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:110) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:215) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:313) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:139) > at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:138) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:121) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)