[ 
https://issues.apache.org/jira/browse/KAFKA-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170824#comment-16170824
 ] 

Alexey Pervushin commented on KAFKA-5195:
-----------------------------------------

We have the same issue. It happened once.
In server.log I have enormous number of messages:
{noformat}
[2017-09-02 14:02:19,074] ERROR {ReplicaFetcherThread-0-123} 
[ReplicaFetcherThread-0-123], Error for partition [partition_name, 0] to broker 
123:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
{noformat}

Slightly prior that I noticed pretty significant GC collection time spike and a 
lot of messages about network issues to/from this broker(probably because of GC 
freeze), like:
{noformat}
[2017-09-02 14:04:53,550] WARN {main-SendThread(10.69.102.249:2181)} Client 
session timed out, have not h
eard from server in 4371ms for sessionid 0x35de55b9852d48b 
(org.apache.zookeeper.ClientCnxn)
...
[2017-09-02 14:05:28,739] WARN {main-SendThread(10.69.145.152:2181)} Unable to 
reconnect to ZooKeeper ser
vice, session 0x35de55b9852d48b has expired (org.apache.zookeeper.ClientCnxn)
{noformat}
and
{noformat}
[2017-09-02 14:02:18,655] WARN {ReplicaFetcherThread-0-124} 
[ReplicaFetcherThread-0-124], Err
or in fetch kafka.server.ReplicaFetcherThread$FetchRequest@33884d9f 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 124 was disconnected before the response was 
read
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$appl
y$1.apply(NetworkClientBlockingOps.scala:114)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$appl
y$1.apply(NetworkClientBlockingOps.scala:112)
        at scala.Option.foreach(Option.scala:257)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(Network
ClientBlockingOps.scala:112)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(Network
ClientBlockingOps.scala:108)
        at 
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
        at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$e
xtension(NetworkClientBlockingOps.scala:142)
        at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOp
s.scala:108)
        at 
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:249)
        at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
        at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
{noformat}

We use Kafka = 0.10.2.1 + backported patch from 
https://issues.apache.org/jira/browse/KAFKA-5413
java-8-oracle-1.8.0.92 with G1 as GC.

Broker configuration:
{noformat}
broker.id=<ID>
log.dirs=<PATH>
zookeeper.connect=...

auto.create.topics.enable=true
connections.max.idle.ms=3600000
default.replication.factor=3
delete.topic.enable=true
group.max.session.timeout.ms=300000
inter.broker.protocol.version=0.10.2.0
log.cleaner.dedupe.buffer.size=536870912
log.cleaner.enable=true
log.message.format.version=0.9.0.1
log.retention.hours=72
log.segment.bytes=268435456
message.max.bytes=1000000
min.insync.replicas=2
num.io.threads=5
offsets.retention.minutes=4320
offsets.topic.segment.bytes=104857600
replica.fetch.max.bytes=10485760
request.timeout.ms=300001
reserved.broker.max.id=2113929216
unclean.leader.election.enable=false
{noformat}

> Endless NotLeaderForPartitionException for ReplicaFetcherThread
> ---------------------------------------------------------------
>
>                 Key: KAFKA-5195
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5195
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.1.1
>         Environment: 3 Kafka brokers on top of Kubernetes, using Docker image 
> wurstmeister/kafka:0.10.1.1.
> Environment variables:
>       KAFKA_ADVERTISED_HOST_NAME:     kafka-ypimp-2
>       KAFKA_ADVERTISED_PORT:          9092
>       KAFKA_ZOOKEEPER_CONNECT:                
> zookeeper-ypimp-0:2181,zookeeper-ypimp-1:2181,zookeeper-ypimp-2:2181
>       KAFKA_DELETE_TOPIC_ENABLE:      true
>       KAFKA_BROKER_ID:                        2
>       JMX_PORT:                               1099
>       KAFKA_JMX_OPTS:                 -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Djava.rmi.server.hostname=kafka-ypimp-2.default.svc.cluster.local 
> -Dcom.sun.management.jmxremote.rmi.port=1099
>       KAFKA_LOG_RETENTION_HOURS:      96
>       KAFKA_AUTO_CREATE_TOPICS_ENABLE:        false
> Zookeeper version: 3.4.8. 
> Number of Zk nodes: 3.
>            Reporter: Andrea Gardiman
>
> One of the 3 brokers is suddenly in a bad state. It endlessly prints out the 
> following message, for every partition: 
> [2017-05-08 13:51:16,748] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [partition_name,5] to broker 
> 0:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
> is not the leader for that topic-partition. 
> (kafka.server.ReplicaFetcherThread)
> In zookeeper, under /brokers/ids, I can't find the zkNode for broker 2. There 
> are only the zkNodes 0 and 1.
> What kind of error this can be?
> Please, let me know if you need some more informaton, I don't know hot to 
> properly debug it.
> Many thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to