[jira] [Updated] (KAFKA-1557) ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth)
[ https://issues.apache.org/jira/browse/KAFKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Lvovitch updated KAFKA-1557: - Attachment: server2.properties server1.properties ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth) -- Key: KAFKA-1557 URL: https://issues.apache.org/jira/browse/KAFKA-1557 Project: Kafka Issue Type: Bug Components: consumer, controller, core, replication Affects Versions: 0.8.0, 0.8.1 Environment: OSX 10.9.3, Linux Scientific 6.5 It actually doesn't seem to matter and appears to be OS-agnostic Reporter: Oleg Lvovitch Assignee: Neha Narkhede Fix For: 0.8.1.1, 0.8.2 Attachments: server1.properties, server2.properties TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist - Start zookeper using sh bin/zookeeper-server-start.sh config/zookeper.properties - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties - Create a new topic: sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3 - Examine topic state: sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test - note that all ISRs are of length 2 - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds - Restart broker 2 using the original command - Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected) - Run the attached Scala snippet again. EXPECTED: - The ISR information are of length 2 ACTUAL: - ALL ISRs contain just broker 0 NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yeild consistent repro -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1557) ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth)
[ https://issues.apache.org/jira/browse/KAFKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Lvovitch updated KAFKA-1557: - Attachment: BrokenKafkaLink.scala ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth) -- Key: KAFKA-1557 URL: https://issues.apache.org/jira/browse/KAFKA-1557 Project: Kafka Issue Type: Bug Components: consumer, controller, core, replication Affects Versions: 0.8.0, 0.8.1 Environment: OSX 10.9.3, Linux Scientific 6.5 It actually doesn't seem to matter and appears to be OS-agnostic Reporter: Oleg Lvovitch Assignee: Neha Narkhede Fix For: 0.8.1.1, 0.8.2 Attachments: BrokenKafkaLink.scala, server1.properties, server2.properties TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist - Start zookeper using sh bin/zookeeper-server-start.sh config/zookeper.properties - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties - Create a new topic: sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3 - Examine topic state: sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test - note that all ISRs are of length 2 - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds - Restart broker 2 using the original command - Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected) - Run the attached Scala snippet again. EXPECTED: - The ISR information are of length 2 ACTUAL: - ALL ISRs contain just broker 0 NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yeild consistent repro -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1557) ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth)
[ https://issues.apache.org/jira/browse/KAFKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Lvovitch updated KAFKA-1557: - Description: TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist - Start zookeper using sh bin/zookeeper-server-start.sh config/zookeper.properties - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties - Create a new topic: sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3 - Examine topic state: sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test - note that all ISRs are of length 2 - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds - Restart broker 2 using the original command - Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected) - Run the attached Scala snippet again. EXPECTED: - The ISR information are of length 2 ACTUAL: - ALL ISRs contain just broker 0 NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yield consistent repro Basically it appears that brokers have incorrect ISR information for the metadata cache. Our production servers exhibit the same problem - after a topic gets created everything looks fine, but as brokers get restarted, ISR reported by the brokers is wrong, whereas the one in ZK appears to report the truth (it shrinks as brokers get shut down and grows back up after they get restarted) I'm not sure if this has wider impact on the functioning of the cluster - bad metadata information is bad - but so far there has been no evidence of that was: TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist - Start zookeper using sh bin/zookeeper-server-start.sh config/zookeper.properties - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties - Create a new topic: sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3 - Examine topic state: sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test - note that all ISRs are of length 2 - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds - Restart broker 2 using the original command - Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected) - Run the attached Scala snippet again. EXPECTED: - The ISR information are of length 2 ACTUAL: - ALL ISRs contain just broker 0 NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yeild consistent repro ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth) -- Key: KAFKA-1557 URL: https://issues.apache.org/jira/browse/KAFKA-1557 Project: Kafka Issue Type: Bug Components: consumer, controller, core, replication Affects Versions: 0.8.0, 0.8.1 Environment: OSX 10.9.3, Linux Scientific 6.5 It actually doesn't seem to matter and appears to be OS-agnostic Reporter: Oleg Lvovitch Assignee: Neha Narkhede Fix For: 0.8.1.1, 0.8.2 Attachments: BrokenKafkaLink.scala, server1.properties, server2.properties TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy
[jira] [Updated] (KAFKA-1557) ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth)
[ https://issues.apache.org/jira/browse/KAFKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy updated KAFKA-1557: -- Component/s: (was: consumer) Labels: newbie++ (was: ) ISR reported by TopicMetadataResponse most of the time doesn't match the Zookeeper information (and the truth) -- Key: KAFKA-1557 URL: https://issues.apache.org/jira/browse/KAFKA-1557 Project: Kafka Issue Type: Bug Components: controller, core, replication Affects Versions: 0.8.0, 0.8.1 Environment: OSX 10.9.3, Linux Scientific 6.5 It actually doesn't seem to matter and appears to be OS-agnostic Reporter: Oleg Lvovitch Assignee: Neha Narkhede Labels: newbie++ Fix For: 0.8.1.1, 0.8.2 Attachments: BrokenKafkaLink.scala, server1.properties, server2.properties TL;DR - after a topic is created, and at least one broker in the ISR is restarted, the ISR reported by the TopicMetadataResponse is incorrect. Specific steps to repro: - Download 0.8.1 Kafka - Copy server.properties twice into server1.properties and server2.properties (attached) - basically just ports and log paths changed to allow brokers to co-exist - Start zookeper using sh bin/zookeeper-server-start.sh config/zookeper.properties - Start broker1: 'sh bin/kafka-server-start.sh config/server1.properties - Start broker2: 'sh bin/kafka-server-start.sh config/server2.properties - Create a new topic: sh bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --replication-factor 2 --partitions 3 - Examine topic state: sh bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test - note that all ISRs are of length 2 - Run the attached Scala code that uses TopicMetadataRequest to exmaine topic state. Observer that all ISRs are of length 2 and match the information output by the script - Shut down broker2 (simply hit Cntrl-C in the terminal), wait 5-10 seconds - Restart broker 2 using the original command - Check the status of the topic again. Observe that the leader for all topics is 0 (as expected), and all ISRs contain both brokers (as expected) - Run the attached Scala snippet again. EXPECTED: - The ISR information are of length 2 ACTUAL: - ALL ISRs contain just broker 0 NOTE: depending on how long broker 2 was down, sometimes some ISRs will contain the full list, but shutting it down for 15+ secs seem to always yield consistent repro Basically it appears that brokers have incorrect ISR information for the metadata cache. Our production servers exhibit the same problem - after a topic gets created everything looks fine, but as brokers get restarted, ISR reported by the brokers is wrong, whereas the one in ZK appears to report the truth (it shrinks as brokers get shut down and grows back up after they get restarted) I'm not sure if this has wider impact on the functioning of the cluster - bad metadata information is bad - but so far there has been no evidence of that -- This message was sent by Atlassian JIRA (v6.2#6252)