[ https://issues.apache.org/jira/browse/KAFKA-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299342#comment-16299342 ]
Rong Tang edited comment on KAFKA-6375 at 12/21/17 12:53 AM: ------------------------------------------------------------- Hi, [~huxi_2b] Any thought on why the "exception: Unable to establish loopback connection" happen, or any way to handle this exception? Another broker met this exception again, and its replicas stayed out of sync for 2 days until I restarted it. both brokers had been controller before I restarted, not sure if related. And I only see the exception when starting broker. Thanks. was (Author: trjianjianjiao): Hi, huxihx Any thought on why the "exception: Unable to establish loopback connection" happen, or any way to handle this exception? Another broker met this exception again, and its replicas stayed out of sync for 2 days until I restarted it. both brokers had been controller before I restarted, not sure if related. And I only see the exception when starting broker. Thanks. > Follower replicas can never catch up to be ISR due to creating > ReplicaFetcherThread failed. > ------------------------------------------------------------------------------------------- > > Key: KAFKA-6375 > URL: https://issues.apache.org/jira/browse/KAFKA-6375 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.10.1.0 > Environment: Windows, 23 brokers KafkaCluster > Reporter: Rong Tang > > Hi, I met with a case that in one broker, the out of sync replicas never > catch up. > When the broker starts up, it receives LeaderAndISR requests from controller, > which will call createFetcherThread, the thread creation failed, with > exceptions below. > And then, there is no fetcher for these follower replicas, and it is out of > sync forever. Unless, later, it receives LeaderAndISR requests that has > higher leader EPOCH. The broker had 260 out of 330 replicas out of sync for > one day, until I restarted it. > Restart the broker can mitigate the issue. > I have 2 questions. > First, Why NEW ReplicaFetcherThread failed? > *Second, should Kafka do something to fail over, instead of letting the > broker in abnormal state.* > It is a 23 brokers Kafka cluster running on Windows. each broker has 330 > replicas. > [2017-12-13 16:29:21,317] ERROR Error on broker 1000 while processing > LeaderAndIsr request with correlationId 1 received from controller 427703487 > epoch 22 (state.change.logger) > org.apache.kafka.common.KafkaException: java.io.IOException: Unable to > establish loopback connection > at org.apache.kafka.common.network.Selector.<init>(Selector.java:124) > at > kafka.server.ReplicaFetcherThread.<init>(ReplicaFetcherThread.scala:87) > at > kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > at > kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78) > at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:869) > at > kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:689) > at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:149) > at kafka.server.KafkaApis.handle(KafkaApis.scala:83) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Unable to establish loopback connection > at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:94) > at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:171) > at > sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:50) > at java.nio.channels.Pipe.open(Pipe.java:155) > at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:127) > at > sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:44) > at java.nio.channels.Selector.open(Selector.java:227) > at org.apache.kafka.common.network.Selector.<init>(Selector.java:122) > ... 16 more > Caused by: java.net.ConnectException: Connection timed out: connect > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > at > sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:127) > at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:76) > ... 25 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)