[ https://issues.apache.org/jira/browse/KAFKA-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741840#comment-16741840 ]
GaoLiang commented on KAFKA-6582: --------------------------------- Almost the same issue with a fresh install of version 2.1. Environment: Ubuntu 16.04 Linux 4.4.0-141-generic > Partitions get underreplicated, with a single ISR, and doesn't recover. Other > brokers do not take over and we need to manually restart the broker. > -------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-6582 > URL: https://issues.apache.org/jira/browse/KAFKA-6582 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 1.0.0 > Environment: Ubuntu 16.04 > Linux kafka04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 > x86_64 x86_64 x86_64 GNU/Linux > java version "9.0.1" > Java(TM) SE Runtime Environment (build 9.0.1+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) > but also tried with the latest JVM 8 before with the same result. > Reporter: Jurriaan Pruis > Priority: Major > > Partitions get underreplicated, with a single ISR, and doesn't recover. Other > brokers do not take over and we need to manually restart the 'single ISR' > broker (if you describe the partitions of replicated topic it is clear that > some partitions are only in sync on this broker). > This bug resembles KAFKA-4477 a lot, but since that issue is marked as > resolved this is probably something else but similar. > We have the same issue (or at least it looks pretty similar) on Kafka 1.0. > Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we've > upgraded from Kafka 0.10.2.1). > This happens almost every 24-48 hours on a random broker. This is why we > currently have a cronjob which restarts every broker every 24 hours. > During this issue the ISR shows the following server log: > {code:java} > [2018-02-20 12:02:08,342] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.148.20:56352-96708 (kafka.network.Processor) > [2018-02-20 12:02:08,364] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.150.25:54412-96715 (kafka.network.Processor) > [2018-02-20 12:02:08,349] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.149.18:35182-96705 (kafka.network.Processor) > [2018-02-20 12:02:08,379] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.150.25:54456-96717 (kafka.network.Processor) > [2018-02-20 12:02:08,448] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.159.20:36388-96720 (kafka.network.Processor) > [2018-02-20 12:02:08,683] WARN Attempting to send response via channel for > which there is no open connection, connection id > 10.132.0.32:9092-10.14.157.110:41922-96740 (kafka.network.Processor) > {code} > Also on the ISR broker, the controller log shows this: > {code:java} > [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]: > Controller 3 connected to 10.132.0.32:9092 (id: 3 rack: null) for sending > state change requests (kafka.controller.RequestSendThread) > [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]: > Controller 3 connected to 10.132.0.10:9092 (id: 0 rack: null) for sending > state change requests (kafka.controller.RequestSendThread) > [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]: > Controller 3 connected to 10.132.0.12:9092 (id: 1 rack: null) for sending > state change requests (kafka.controller.RequestSendThread){code} > And the non-ISR brokers show these kind of errors: > > {code:java} > 2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=1, leaderId=3, > fetcherId=0] Error in fetch to broker 3, request (type=FetchRequest, > replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760, > fetchData={......................}, isolationLevel=READ_UNCOMMITTED) > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 3 was disconnected before the response was > read > at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96) > at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205) > at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)