[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers
[ https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626807#comment-14626807 ] Ashish K Singh edited comment on KAFKA-972 at 7/14/15 6:22 PM: --- Thanks [~junrao]! was (Author: singhashish): Thanks Jun! MetadataRequest returns stale list of brokers - Key: KAFKA-972 URL: https://issues.apache.org/jira/browse/KAFKA-972 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Vinicius Carvalho Assignee: Ashish K Singh Fix For: 0.8.3 Attachments: BrokerMetadataTest.scala, KAFKA-972.patch, KAFKA-972_2015-06-30_18:42:13.patch, KAFKA-972_2015-07-01_01:36:56.patch, KAFKA-972_2015-07-01_01:42:42.patch, KAFKA-972_2015-07-01_08:06:03.patch, KAFKA-972_2015-07-06_23:07:34.patch, KAFKA-972_2015-07-07_10:42:41.patch, KAFKA-972_2015-07-07_23:24:13.patch When we issue an metadatarequest towards the cluster, the list of brokers is stale. I mean, even when a broker is down, it's returned back to the client. The following are examples of two invocations one with both brokers online and the second with a broker down: { brokers: [ { nodeId: 0, host: 10.139.245.106, port: 9092, byteLength: 24 }, { nodeId: 1, host: localhost, port: 9093, byteLength: 19 } ], topicMetadata: [ { topicErrorCode: 0, topicName: foozbar, partitions: [ { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 0, leader: 0, byteLength: 26 }, { replicas: [ 1 ], isr: [ 1 ], partitionErrorCode: 0, partitionId: 1, leader: 1, byteLength: 26 }, { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 2, leader: 0, byteLength: 26 }, { replicas: [ 1 ], isr: [ 1 ], partitionErrorCode: 0, partitionId: 3, leader: 1, byteLength: 26 }, { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 4, leader: 0, byteLength: 26 } ], byteLength: 145 } ], responseSize: 200, correlationId: -1000 } { brokers: [ { nodeId: 0, host: 10.139.245.106, port: 9092, byteLength: 24 }, { nodeId: 1, host: localhost, port: 9093, byteLength: 19 } ], topicMetadata: [ { topicErrorCode: 0, topicName: foozbar, partitions: [ { replicas: [ 0 ], isr: [], partitionErrorCode: 5, partitionId: 0, leader: -1, byteLength: 22 }, { replicas: [ 1 ], isr: [ 1 ], partitionErrorCode: 0, partitionId: 1, leader: 1, byteLength: 26 }, { replicas: [ 0 ], isr: [], partitionErrorCode: 5, partitionId: 2, leader: -1,
[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers
[ https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600602#comment-14600602 ] Ashish K Singh edited comment on KAFKA-972 at 6/25/15 3:10 AM: --- Hey Guys, I spent some time reproducing the issue and finding the root cause. Turns out KAFKA-1367 is not the issue here. Below is the problem and my suggested solution. Problem: Alive brokers list not being propagated to brokers by coordinator. When a broker is started, it writes to ZK brokers path. Coordinator watches that path and notices the new broker. On noticing a new broker, the coordinator sends the UpdateMetadataRequest to only the new broker that just started up. The other brokers in cluster never gets to know that there are new brokers in the cluster. Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers information will be propagated to all live brokers after ISR changes at any broker. However, if there are no topics/ partitions KAFKA-1367 will not help and this issue will still be there. Solution: Instead of sending the UpdateMetadataRequest only to new broker, send it to all live brokers in the cluster. [~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], [~jjkoshy] please provide your thoughts. I have a patch ready which I will post if you guys think this is indeed the correct approach. I have verified that above approach fixes the issue. was (Author: singhashish): Hey Guys, I spent some time reproducing the issue and finding the root cause. Turns out KAFKA-1367 is not the issue here. Below is the problem and my suggested solution. Problem: Alive brokers list not being propagated to brokers by coordinator. When a broker is started, it writes to ZK brokers path. Coordinator watches that path and notices the new broker. On noticing a new broker, the coordinator sends the UpdateMetadataRequest to only the new broker that just started up. The other brokers in cluster never gets to know that there are new brokers in the cluster. Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers information will be propagated to all live brokers after ISR changes at any broker. However, if there are no topics/ partitions KAFKA-1367 will not help and this issue will still be there. Solution: Instead of sending the UpdateMetadataRequest only to new broker, send it to all live brokers in the cluster. [~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], [~jjkoshy] please provide your thoughts. I have a patch ready which I will post if you guys think this is indeed the correct approach. MetadataRequest returns stale list of brokers - Key: KAFKA-972 URL: https://issues.apache.org/jira/browse/KAFKA-972 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Vinicius Carvalho Assignee: Ashish K Singh Attachments: BrokerMetadataTest.scala When we issue an metadatarequest towards the cluster, the list of brokers is stale. I mean, even when a broker is down, it's returned back to the client. The following are examples of two invocations one with both brokers online and the second with a broker down: { brokers: [ { nodeId: 0, host: 10.139.245.106, port: 9092, byteLength: 24 }, { nodeId: 1, host: localhost, port: 9093, byteLength: 19 } ], topicMetadata: [ { topicErrorCode: 0, topicName: foozbar, partitions: [ { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 0, leader: 0, byteLength: 26 }, { replicas: [ 1 ], isr: [ 1 ], partitionErrorCode: 0, partitionId: 1, leader: 1, byteLength: 26 }, { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 2, leader: 0, byteLength: 26 }, { replicas: [ 1
[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers
[ https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600602#comment-14600602 ] Ashish K Singh edited comment on KAFKA-972 at 6/25/15 3:10 AM: --- Hey Guys, I spent some time reproducing the issue and finding the root cause. Turns out KAFKA-1367 is not the issue here. Below is the problem and my suggested solution. Problem: Alive brokers list not being propagated to brokers by coordinator. When a broker is started, it writes to ZK brokers path. Coordinator watches that path and notices the new broker. On noticing a new broker, the coordinator sends the UpdateMetadataRequest to only the new broker that just started up. The other brokers in cluster never gets to know that there are new brokers in the cluster. Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers information will be propagated to all live brokers after ISR changes at any broker. However, if there are no topics/ partitions KAFKA-1367 will not help and this issue will still be there. Solution: Instead of sending the UpdateMetadataRequest only to new broker, send it to all live brokers in the cluster. [~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], [~jjkoshy] please provide your thoughts. I have a patch ready which I will post if you guys think this is indeed the correct approach. was (Author: singhashish): Hey Guys, I spent some time reproducing the issue and finding the root cause. Turns out KAFKA-1367 is not the issue here. Below is the problem and my suggested solution. Problem: Alive brokers list not being propagated to brokers by coordinator. When a broker is started, it writes to ZK brokers path. Coordinator watches that path and notices the new broker. On noticing a new broker, the coordinator sends the UpdateMetadataRequest to only the new broker that just started up. The other brokers in cluster never gets to know that there are new brokers in the cluster. Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers information will be propagated to all live brokers after ISR changes at any broker. However, if there are no topics/ partitions KAFKA-1367 will not help and this issue will still be there. Solution: Instead of sending the UpdateMetadataRequest only to new broker, send it to all live brokers in the cluster. [~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], [~jjkoshy] please provide your thoughts. I have a patch ready which I will post if you guys think this is indeed the correct approach. MetadataRequest returns stale list of brokers - Key: KAFKA-972 URL: https://issues.apache.org/jira/browse/KAFKA-972 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Vinicius Carvalho Assignee: Ashish K Singh Attachments: BrokerMetadataTest.scala When we issue an metadatarequest towards the cluster, the list of brokers is stale. I mean, even when a broker is down, it's returned back to the client. The following are examples of two invocations one with both brokers online and the second with a broker down: { brokers: [ { nodeId: 0, host: 10.139.245.106, port: 9092, byteLength: 24 }, { nodeId: 1, host: localhost, port: 9093, byteLength: 19 } ], topicMetadata: [ { topicErrorCode: 0, topicName: foozbar, partitions: [ { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 0, leader: 0, byteLength: 26 }, { replicas: [ 1 ], isr: [ 1 ], partitionErrorCode: 0, partitionId: 1, leader: 1, byteLength: 26 }, { replicas: [ 0 ], isr: [ 0 ], partitionErrorCode: 0, partitionId: 2, leader: 0, byteLength: 26 }, { replicas: [ 1 ], isr: [