Re: Getting NotLeaderForPartitionException in kafka broker
Force on This。 宋鑫/NemoVisky
Re: Getting NotLeaderForPartitionException in kafka broker
i will try to reproduce this problem later this week. Bouncing the broker fixed the issue but the issue surfaced again after a period of time. A little more context about this is that the cluster was deployed to VMs and I discovered that the issue appeared whenever CPU wait time was extremely high like 90% CPU time spent on I/O wait. I am more interesting in understanding under what circumstance this issue would happen so that I can take appropriate actions On Fri, May 15, 2015 at 8:04 AM, Jiangjie Qin wrote: > > If you can reproduce this problem steadily, once you see this issue, can > you grep the controller log for topic partition in question and see if > there is anything interesting? > > Thanks. > > Jiangjie (Becket) Qin > > On 5/14/15, 3:43 AM, "tao xiao" wrote: > > >Yes, it does exist in ZK and the node that had the > >NotLeaderForPartitionException > >is the leader of the topic > > > >On Thu, May 14, 2015 at 6:12 AM, Jiangjie Qin > >wrote: > > > >> Does this topic exist in Zookeeper? > >> > >> On 5/12/15, 11:35 PM, "tao xiao" wrote: > >> > >> >Hi, > >> > > >> >Any updates on this issue? I keep seeing this issue happening over and > >> >over > >> >again > >> > > >> >On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: > >> > > >> >> Hi team, > >> >> > >> >> I have a 12 nodes cluster that has 800 topics and each of which has > >> >>only 1 > >> >> partition. I observed that one of the node keeps generating > >> >> NotLeaderForPartitionException that causes the node to be > >>unresponsive > >> >>to > >> >> all requests. Below is the exception > >> >> > >> >> [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error > >>for > >> >> partition [topic1,0] to broker 12:class > >> >> kafka.common.NotLeaderForPartitionException > >> >> (kafka.server.ReplicaFetcherThread) > >> >> > >> >> All other nodes in the cluster generate lots of replication error > >>too as > >> >> shown below due to unresponsiveness of above node. > >> >> > >> >> [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch > >> >> request with correlation id 3630911 from client > >> >>ReplicaFetcherThread-0-1 on > >> >> partition [topic1,0] failed due to Leader not local for partition > >> >> [cg22_user.item_attr_info.lcr,0] on broker 1 > >> >>(kafka.server.ReplicaManager) > >> >> > >> >> Any suggestion why the node runs into the unstable stage and any > >> >> configuration I can set to prevent this? > >> >> > >> >> I use kafka 0.8.2.1 > >> >> > >> >> And here is the server.properties > >> >> > >> >> > >> >> broker.id=5 > >> >> port=9092 > >> >> num.network.threads=3 > >> >> num.io.threads=8 > >> >> socket.send.buffer.bytes=1048576 > >> >> socket.receive.buffer.bytes=1048576 > >> >> socket.request.max.bytes=104857600 > >> >> log.dirs=/mnt/kafka > >> >> num.partitions=1 > >> >> num.recovery.threads.per.data.dir=1 > >> >> log.retention.hours=1 > >> >> log.segment.bytes=1073741824 > >> >> log.retention.check.interval.ms=30 > >> >> log.cleaner.enable=false > >> >> zookeeper.connect=ip:2181 > >> >> zookeeper.connection.timeout.ms=6000 > >> >> unclean.leader.election.enable=false > >> >> delete.topic.enable=true > >> >> default.replication.factor=3 > >> >> num.replica.fetchers=3 > >> >> delete.topic.enable=true > >> >> kafka.metrics.reporters=report.KafkaMetricsCollector > >> >> straas.hubble.conf.file=/etc/kafka/report.conf > >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> Regards, > >> >> Tao > >> >> > >> > > >> > > >> > > >> >-- > >> >Regards, > >> >Tao > >> > >> > > > > > >-- > >Regards, > >Tao > > -- Regards, Tao
Re: Getting NotLeaderForPartitionException in kafka broker
If you can reproduce this problem steadily, once you see this issue, can you grep the controller log for topic partition in question and see if there is anything interesting? Thanks. Jiangjie (Becket) Qin On 5/14/15, 3:43 AM, "tao xiao" wrote: >Yes, it does exist in ZK and the node that had the >NotLeaderForPartitionException >is the leader of the topic > >On Thu, May 14, 2015 at 6:12 AM, Jiangjie Qin >wrote: > >> Does this topic exist in Zookeeper? >> >> On 5/12/15, 11:35 PM, "tao xiao" wrote: >> >> >Hi, >> > >> >Any updates on this issue? I keep seeing this issue happening over and >> >over >> >again >> > >> >On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: >> > >> >> Hi team, >> >> >> >> I have a 12 nodes cluster that has 800 topics and each of which has >> >>only 1 >> >> partition. I observed that one of the node keeps generating >> >> NotLeaderForPartitionException that causes the node to be >>unresponsive >> >>to >> >> all requests. Below is the exception >> >> >> >> [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error >>for >> >> partition [topic1,0] to broker 12:class >> >> kafka.common.NotLeaderForPartitionException >> >> (kafka.server.ReplicaFetcherThread) >> >> >> >> All other nodes in the cluster generate lots of replication error >>too as >> >> shown below due to unresponsiveness of above node. >> >> >> >> [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch >> >> request with correlation id 3630911 from client >> >>ReplicaFetcherThread-0-1 on >> >> partition [topic1,0] failed due to Leader not local for partition >> >> [cg22_user.item_attr_info.lcr,0] on broker 1 >> >>(kafka.server.ReplicaManager) >> >> >> >> Any suggestion why the node runs into the unstable stage and any >> >> configuration I can set to prevent this? >> >> >> >> I use kafka 0.8.2.1 >> >> >> >> And here is the server.properties >> >> >> >> >> >> broker.id=5 >> >> port=9092 >> >> num.network.threads=3 >> >> num.io.threads=8 >> >> socket.send.buffer.bytes=1048576 >> >> socket.receive.buffer.bytes=1048576 >> >> socket.request.max.bytes=104857600 >> >> log.dirs=/mnt/kafka >> >> num.partitions=1 >> >> num.recovery.threads.per.data.dir=1 >> >> log.retention.hours=1 >> >> log.segment.bytes=1073741824 >> >> log.retention.check.interval.ms=30 >> >> log.cleaner.enable=false >> >> zookeeper.connect=ip:2181 >> >> zookeeper.connection.timeout.ms=6000 >> >> unclean.leader.election.enable=false >> >> delete.topic.enable=true >> >> default.replication.factor=3 >> >> num.replica.fetchers=3 >> >> delete.topic.enable=true >> >> kafka.metrics.reporters=report.KafkaMetricsCollector >> >> straas.hubble.conf.file=/etc/kafka/report.conf >> >> >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> Tao >> >> >> > >> > >> > >> >-- >> >Regards, >> >Tao >> >> > > >-- >Regards, >Tao
Re: Getting NotLeaderForPartitionException in kafka broker
Can you try bouncing that broker? Thanks, Mayuresh On Thu, May 14, 2015 at 3:43 AM, tao xiao wrote: > Yes, it does exist in ZK and the node that had the > NotLeaderForPartitionException > is the leader of the topic > > On Thu, May 14, 2015 at 6:12 AM, Jiangjie Qin > wrote: > > > Does this topic exist in Zookeeper? > > > > On 5/12/15, 11:35 PM, "tao xiao" wrote: > > > > >Hi, > > > > > >Any updates on this issue? I keep seeing this issue happening over and > > >over > > >again > > > > > >On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: > > > > > >> Hi team, > > >> > > >> I have a 12 nodes cluster that has 800 topics and each of which has > > >>only 1 > > >> partition. I observed that one of the node keeps generating > > >> NotLeaderForPartitionException that causes the node to be unresponsive > > >>to > > >> all requests. Below is the exception > > >> > > >> [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for > > >> partition [topic1,0] to broker 12:class > > >> kafka.common.NotLeaderForPartitionException > > >> (kafka.server.ReplicaFetcherThread) > > >> > > >> All other nodes in the cluster generate lots of replication error too > as > > >> shown below due to unresponsiveness of above node. > > >> > > >> [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch > > >> request with correlation id 3630911 from client > > >>ReplicaFetcherThread-0-1 on > > >> partition [topic1,0] failed due to Leader not local for partition > > >> [cg22_user.item_attr_info.lcr,0] on broker 1 > > >>(kafka.server.ReplicaManager) > > >> > > >> Any suggestion why the node runs into the unstable stage and any > > >> configuration I can set to prevent this? > > >> > > >> I use kafka 0.8.2.1 > > >> > > >> And here is the server.properties > > >> > > >> > > >> broker.id=5 > > >> port=9092 > > >> num.network.threads=3 > > >> num.io.threads=8 > > >> socket.send.buffer.bytes=1048576 > > >> socket.receive.buffer.bytes=1048576 > > >> socket.request.max.bytes=104857600 > > >> log.dirs=/mnt/kafka > > >> num.partitions=1 > > >> num.recovery.threads.per.data.dir=1 > > >> log.retention.hours=1 > > >> log.segment.bytes=1073741824 > > >> log.retention.check.interval.ms=30 > > >> log.cleaner.enable=false > > >> zookeeper.connect=ip:2181 > > >> zookeeper.connection.timeout.ms=6000 > > >> unclean.leader.election.enable=false > > >> delete.topic.enable=true > > >> default.replication.factor=3 > > >> num.replica.fetchers=3 > > >> delete.topic.enable=true > > >> kafka.metrics.reporters=report.KafkaMetricsCollector > > >> straas.hubble.conf.file=/etc/kafka/report.conf > > >> > > >> > > >> > > >> > > >> -- > > >> Regards, > > >> Tao > > >> > > > > > > > > > > > >-- > > >Regards, > > >Tao > > > > > > > -- > Regards, > Tao > -- -Regards, Mayuresh R. Gharat (862) 250-7125
Re: Getting NotLeaderForPartitionException in kafka broker
Yes, it does exist in ZK and the node that had the NotLeaderForPartitionException is the leader of the topic On Thu, May 14, 2015 at 6:12 AM, Jiangjie Qin wrote: > Does this topic exist in Zookeeper? > > On 5/12/15, 11:35 PM, "tao xiao" wrote: > > >Hi, > > > >Any updates on this issue? I keep seeing this issue happening over and > >over > >again > > > >On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: > > > >> Hi team, > >> > >> I have a 12 nodes cluster that has 800 topics and each of which has > >>only 1 > >> partition. I observed that one of the node keeps generating > >> NotLeaderForPartitionException that causes the node to be unresponsive > >>to > >> all requests. Below is the exception > >> > >> [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for > >> partition [topic1,0] to broker 12:class > >> kafka.common.NotLeaderForPartitionException > >> (kafka.server.ReplicaFetcherThread) > >> > >> All other nodes in the cluster generate lots of replication error too as > >> shown below due to unresponsiveness of above node. > >> > >> [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch > >> request with correlation id 3630911 from client > >>ReplicaFetcherThread-0-1 on > >> partition [topic1,0] failed due to Leader not local for partition > >> [cg22_user.item_attr_info.lcr,0] on broker 1 > >>(kafka.server.ReplicaManager) > >> > >> Any suggestion why the node runs into the unstable stage and any > >> configuration I can set to prevent this? > >> > >> I use kafka 0.8.2.1 > >> > >> And here is the server.properties > >> > >> > >> broker.id=5 > >> port=9092 > >> num.network.threads=3 > >> num.io.threads=8 > >> socket.send.buffer.bytes=1048576 > >> socket.receive.buffer.bytes=1048576 > >> socket.request.max.bytes=104857600 > >> log.dirs=/mnt/kafka > >> num.partitions=1 > >> num.recovery.threads.per.data.dir=1 > >> log.retention.hours=1 > >> log.segment.bytes=1073741824 > >> log.retention.check.interval.ms=30 > >> log.cleaner.enable=false > >> zookeeper.connect=ip:2181 > >> zookeeper.connection.timeout.ms=6000 > >> unclean.leader.election.enable=false > >> delete.topic.enable=true > >> default.replication.factor=3 > >> num.replica.fetchers=3 > >> delete.topic.enable=true > >> kafka.metrics.reporters=report.KafkaMetricsCollector > >> straas.hubble.conf.file=/etc/kafka/report.conf > >> > >> > >> > >> > >> -- > >> Regards, > >> Tao > >> > > > > > > > >-- > >Regards, > >Tao > > -- Regards, Tao
Re: Getting NotLeaderForPartitionException in kafka broker
Does this topic exist in Zookeeper? On 5/12/15, 11:35 PM, "tao xiao" wrote: >Hi, > >Any updates on this issue? I keep seeing this issue happening over and >over >again > >On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: > >> Hi team, >> >> I have a 12 nodes cluster that has 800 topics and each of which has >>only 1 >> partition. I observed that one of the node keeps generating >> NotLeaderForPartitionException that causes the node to be unresponsive >>to >> all requests. Below is the exception >> >> [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for >> partition [topic1,0] to broker 12:class >> kafka.common.NotLeaderForPartitionException >> (kafka.server.ReplicaFetcherThread) >> >> All other nodes in the cluster generate lots of replication error too as >> shown below due to unresponsiveness of above node. >> >> [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch >> request with correlation id 3630911 from client >>ReplicaFetcherThread-0-1 on >> partition [topic1,0] failed due to Leader not local for partition >> [cg22_user.item_attr_info.lcr,0] on broker 1 >>(kafka.server.ReplicaManager) >> >> Any suggestion why the node runs into the unstable stage and any >> configuration I can set to prevent this? >> >> I use kafka 0.8.2.1 >> >> And here is the server.properties >> >> >> broker.id=5 >> port=9092 >> num.network.threads=3 >> num.io.threads=8 >> socket.send.buffer.bytes=1048576 >> socket.receive.buffer.bytes=1048576 >> socket.request.max.bytes=104857600 >> log.dirs=/mnt/kafka >> num.partitions=1 >> num.recovery.threads.per.data.dir=1 >> log.retention.hours=1 >> log.segment.bytes=1073741824 >> log.retention.check.interval.ms=30 >> log.cleaner.enable=false >> zookeeper.connect=ip:2181 >> zookeeper.connection.timeout.ms=6000 >> unclean.leader.election.enable=false >> delete.topic.enable=true >> default.replication.factor=3 >> num.replica.fetchers=3 >> delete.topic.enable=true >> kafka.metrics.reporters=report.KafkaMetricsCollector >> straas.hubble.conf.file=/etc/kafka/report.conf >> >> >> >> >> -- >> Regards, >> Tao >> > > > >-- >Regards, >Tao
Re: Getting NotLeaderForPartitionException in kafka broker
Hi, Any updates on this issue? I keep seeing this issue happening over and over again On Thu, May 7, 2015 at 7:28 PM, tao xiao wrote: > Hi team, > > I have a 12 nodes cluster that has 800 topics and each of which has only 1 > partition. I observed that one of the node keeps generating > NotLeaderForPartitionException that causes the node to be unresponsive to > all requests. Below is the exception > > [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for > partition [topic1,0] to broker 12:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > > All other nodes in the cluster generate lots of replication error too as > shown below due to unresponsiveness of above node. > > [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch > request with correlation id 3630911 from client ReplicaFetcherThread-0-1 on > partition [topic1,0] failed due to Leader not local for partition > [cg22_user.item_attr_info.lcr,0] on broker 1 (kafka.server.ReplicaManager) > > Any suggestion why the node runs into the unstable stage and any > configuration I can set to prevent this? > > I use kafka 0.8.2.1 > > And here is the server.properties > > > broker.id=5 > port=9092 > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=1048576 > socket.receive.buffer.bytes=1048576 > socket.request.max.bytes=104857600 > log.dirs=/mnt/kafka > num.partitions=1 > num.recovery.threads.per.data.dir=1 > log.retention.hours=1 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=30 > log.cleaner.enable=false > zookeeper.connect=ip:2181 > zookeeper.connection.timeout.ms=6000 > unclean.leader.election.enable=false > delete.topic.enable=true > default.replication.factor=3 > num.replica.fetchers=3 > delete.topic.enable=true > kafka.metrics.reporters=report.KafkaMetricsCollector > straas.hubble.conf.file=/etc/kafka/report.conf > > > > > -- > Regards, > Tao > -- Regards, Tao