[ 
https://issues.apache.org/jira/browse/KAFKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Lvovitch updated KAFKA-1557:
---------------------------------

    Description: 
TL;DR - after a topic is created, and at least one broker in the ISR is 
restarted, the ISR reported by the TopicMetadataResponse is incorrect.

Specific steps to repro:
- Download 0.8.1 Kafka
- Copy server.properties twice into server1.properties and server2.properties 
(attached) - basically just ports and log paths changed to allow brokers to 
co-exist
- Start zookeper using "sh bin/zookeeper-server-start.sh 
config/zookeper.properties"
- Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties"
- Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties"
- Create a new topic: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
--create --topic test --replication-factor 2 --partitions 3"
- Examine topic state: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
--describe --topic test" - note that all ISRs are of length 2
- Run the attached Scala code that uses TopicMetadataRequest to exmaine topic 
state. Observer that all ISRs are of length 2 and match the information output 
by the script
- Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds
- Restart broker 2 using the original command
- Check the status of the topic again. Observe that the leader for all topics 
is 0 (as expected), and all ISRs contain both brokers (as expected)
- Run the attached Scala snippet again. 

EXPECTED:
- The ISR information are of length 2

ACTUAL:
- ALL ISRs contain just broker 0

NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain 
the full list, but shutting it down for 15+ secs seem to always yield 
consistent repro


Basically it appears that brokers have incorrect ISR information for the 
metadata cache.
Our production servers exhibit the same problem - after a topic gets created 
everything looks fine, but as brokers get restarted, ISR reported by the 
brokers is wrong, whereas the one in ZK appears to report the truth (it shrinks 
as brokers get shut down and grows back up after they get restarted)

I'm not sure if this has wider impact on the functioning of the cluster - bad 
metadata information is bad - but so far there has been no evidence of that


  was:
TL;DR - after a topic is created, and at least one broker in the ISR is 
restarted, the ISR reported by the TopicMetadataResponse is incorrect.

Specific steps to repro:
- Download 0.8.1 Kafka
- Copy server.properties twice into server1.properties and server2.properties 
(attached) - basically just ports and log paths changed to allow brokers to 
co-exist
- Start zookeper using "sh bin/zookeeper-server-start.sh 
config/zookeper.properties"
- Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties"
- Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties"
- Create a new topic: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
--create --topic test --replication-factor 2 --partitions 3"
- Examine topic state: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
--describe --topic test" - note that all ISRs are of length 2
- Run the attached Scala code that uses TopicMetadataRequest to exmaine topic 
state. Observer that all ISRs are of length 2 and match the information output 
by the script
- Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds
- Restart broker 2 using the original command
- Check the status of the topic again. Observe that the leader for all topics 
is 0 (as expected), and all ISRs contain both brokers (as expected)
- Run the attached Scala snippet again. 

EXPECTED:
- The ISR information are of length 2

ACTUAL:
- ALL ISRs contain just broker 0

NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain 
the full list, but shutting it down for 15+ secs seem to always yeild 
consistent repro




> ISR reported by TopicMetadataResponse most of the time doesn't match the 
> Zookeeper information (and the truth)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1557
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1557
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, controller, core, replication
>    Affects Versions: 0.8.0, 0.8.1
>         Environment: OSX 10.9.3, Linux Scientific 6.5
> It actually doesn't seem to matter and appears to be OS-agnostic
>            Reporter: Oleg Lvovitch
>            Assignee: Neha Narkhede
>             Fix For: 0.8.1.1, 0.8.2
>
>         Attachments: BrokenKafkaLink.scala, server1.properties, 
> server2.properties
>
>
> TL;DR - after a topic is created, and at least one broker in the ISR is 
> restarted, the ISR reported by the TopicMetadataResponse is incorrect.
> Specific steps to repro:
> - Download 0.8.1 Kafka
> - Copy server.properties twice into server1.properties and server2.properties 
> (attached) - basically just ports and log paths changed to allow brokers to 
> co-exist
> - Start zookeper using "sh bin/zookeeper-server-start.sh 
> config/zookeper.properties"
> - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties"
> - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties"
> - Create a new topic: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
> --create --topic test --replication-factor 2 --partitions 3"
> - Examine topic state: "sh bin/kafka-topics.sh --zookeeper localhost:2181 
> --describe --topic test" - note that all ISRs are of length 2
> - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic 
> state. Observer that all ISRs are of length 2 and match the information 
> output by the script
> - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds
> - Restart broker 2 using the original command
> - Check the status of the topic again. Observe that the leader for all topics 
> is 0 (as expected), and all ISRs contain both brokers (as expected)
> - Run the attached Scala snippet again. 
> EXPECTED:
> - The ISR information are of length 2
> ACTUAL:
> - ALL ISRs contain just broker 0
> NOTE: depending on how long broker 2 was down, sometimes some ISRs will 
> contain the full list, but shutting it down for 15+ secs seem to always yield 
> consistent repro
> Basically it appears that brokers have incorrect ISR information for the 
> metadata cache.
> Our production servers exhibit the same problem - after a topic gets created 
> everything looks fine, but as brokers get restarted, ISR reported by the 
> brokers is wrong, whereas the one in ZK appears to report the truth (it 
> shrinks as brokers get shut down and grows back up after they get restarted)
> I'm not sure if this has wider impact on the functioning of the cluster - bad 
> metadata information is bad - but so far there has been no evidence of that



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to