[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers

2015-07-14 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626807#comment-14626807
 ] 

Ashish K Singh edited comment on KAFKA-972 at 7/14/15 6:22 PM:
---

Thanks [~junrao]!


was (Author: singhashish):
Thanks Jun!

 MetadataRequest returns stale list of brokers
 -

 Key: KAFKA-972
 URL: https://issues.apache.org/jira/browse/KAFKA-972
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0
Reporter: Vinicius Carvalho
Assignee: Ashish K Singh
 Fix For: 0.8.3

 Attachments: BrokerMetadataTest.scala, KAFKA-972.patch, 
 KAFKA-972_2015-06-30_18:42:13.patch, KAFKA-972_2015-07-01_01:36:56.patch, 
 KAFKA-972_2015-07-01_01:42:42.patch, KAFKA-972_2015-07-01_08:06:03.patch, 
 KAFKA-972_2015-07-06_23:07:34.patch, KAFKA-972_2015-07-07_10:42:41.patch, 
 KAFKA-972_2015-07-07_23:24:13.patch


 When we issue an metadatarequest towards the cluster, the list of brokers is 
 stale. I mean, even when a broker is down, it's returned back to the client. 
 The following are examples of two invocations one with both brokers online 
 and the second with a broker down:
 {
 brokers: [
 {
 nodeId: 0,
 host: 10.139.245.106,
 port: 9092,
 byteLength: 24
 },
 {
 nodeId: 1,
 host: localhost,
 port: 9093,
 byteLength: 19
 }
 ],
 topicMetadata: [
 {
 topicErrorCode: 0,
 topicName: foozbar,
 partitions: [
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 0,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 ],
 isr: [
 1
 ],
 partitionErrorCode: 0,
 partitionId: 1,
 leader: 1,
 byteLength: 26
 },
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 2,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 ],
 isr: [
 1
 ],
 partitionErrorCode: 0,
 partitionId: 3,
 leader: 1,
 byteLength: 26
 },
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 4,
 leader: 0,
 byteLength: 26
 }
 ],
 byteLength: 145
 }
 ],
 responseSize: 200,
 correlationId: -1000
 }
 {
 brokers: [
 {
 nodeId: 0,
 host: 10.139.245.106,
 port: 9092,
 byteLength: 24
 },
 {
 nodeId: 1,
 host: localhost,
 port: 9093,
 byteLength: 19
 }
 ],
 topicMetadata: [
 {
 topicErrorCode: 0,
 topicName: foozbar,
 partitions: [
 {
 replicas: [
 0
 ],
 isr: [],
 partitionErrorCode: 5,
 partitionId: 0,
 leader: -1,
 byteLength: 22
 },
 {
 replicas: [
 1
 ],
 isr: [
 1
 ],
 partitionErrorCode: 0,
 partitionId: 1,
 leader: 1,
 byteLength: 26
 },
 {
 replicas: [
 0
 ],
 isr: [],
 partitionErrorCode: 5,
 partitionId: 2,
 leader: -1,
  

[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers

2015-06-24 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600602#comment-14600602
 ] 

Ashish K Singh edited comment on KAFKA-972 at 6/25/15 3:10 AM:
---

Hey Guys,

I spent some time reproducing the issue and finding the root cause. Turns out 
KAFKA-1367 is not the issue here. Below is the problem and my suggested 
solution.

Problem:
Alive brokers list not being propagated to brokers by coordinator. When a 
broker is started, it writes to ZK brokers path. Coordinator watches that path 
and notices the new broker. On noticing a new broker, the coordinator sends the 
UpdateMetadataRequest to only the new broker that just started up. The other 
brokers in cluster never gets to know that there are new brokers in the cluster.

Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers 
information will be propagated to all live brokers after ISR changes at any 
broker. However, if there are no topics/ partitions KAFKA-1367 will not help 
and this issue will still be there.

Solution:
Instead of sending the UpdateMetadataRequest only to new broker, send it to all 
live brokers in the cluster.

[~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], 
[~jjkoshy] please provide your thoughts. I have a patch ready which I will post 
if you guys think this is indeed the correct approach. I have verified that 
above approach fixes the issue.


was (Author: singhashish):
Hey Guys,

I spent some time reproducing the issue and finding the root cause. Turns out 
KAFKA-1367 is not the issue here. Below is the problem and my suggested 
solution.

Problem:
Alive brokers list not being propagated to brokers by coordinator. When a 
broker is started, it writes to ZK brokers path. Coordinator watches that path 
and notices the new broker. On noticing a new broker, the coordinator sends the 
UpdateMetadataRequest to only the new broker that just started up. The other 
brokers in cluster never gets to know that there are new brokers in the cluster.

Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers 
information will be propagated to all live brokers after ISR changes at any 
broker. However, if there are no topics/ partitions KAFKA-1367 will not help 
and this issue will still be there.

Solution:
Instead of sending the UpdateMetadataRequest only to new broker, send it to all 
live brokers in the cluster.

[~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], 
[~jjkoshy] please provide your thoughts. I have a patch ready which I will post 
if you guys think this is indeed the correct approach.

 MetadataRequest returns stale list of brokers
 -

 Key: KAFKA-972
 URL: https://issues.apache.org/jira/browse/KAFKA-972
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0
Reporter: Vinicius Carvalho
Assignee: Ashish K Singh
 Attachments: BrokerMetadataTest.scala


 When we issue an metadatarequest towards the cluster, the list of brokers is 
 stale. I mean, even when a broker is down, it's returned back to the client. 
 The following are examples of two invocations one with both brokers online 
 and the second with a broker down:
 {
 brokers: [
 {
 nodeId: 0,
 host: 10.139.245.106,
 port: 9092,
 byteLength: 24
 },
 {
 nodeId: 1,
 host: localhost,
 port: 9093,
 byteLength: 19
 }
 ],
 topicMetadata: [
 {
 topicErrorCode: 0,
 topicName: foozbar,
 partitions: [
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 0,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 ],
 isr: [
 1
 ],
 partitionErrorCode: 0,
 partitionId: 1,
 leader: 1,
 byteLength: 26
 },
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 2,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 

[jira] [Comment Edited] (KAFKA-972) MetadataRequest returns stale list of brokers

2015-06-24 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600602#comment-14600602
 ] 

Ashish K Singh edited comment on KAFKA-972 at 6/25/15 3:10 AM:
---

Hey Guys,

I spent some time reproducing the issue and finding the root cause. Turns out 
KAFKA-1367 is not the issue here. Below is the problem and my suggested 
solution.

Problem:
Alive brokers list not being propagated to brokers by coordinator. When a 
broker is started, it writes to ZK brokers path. Coordinator watches that path 
and notices the new broker. On noticing a new broker, the coordinator sends the 
UpdateMetadataRequest to only the new broker that just started up. The other 
brokers in cluster never gets to know that there are new brokers in the cluster.

Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers 
information will be propagated to all live brokers after ISR changes at any 
broker. However, if there are no topics/ partitions KAFKA-1367 will not help 
and this issue will still be there.

Solution:
Instead of sending the UpdateMetadataRequest only to new broker, send it to all 
live brokers in the cluster.

[~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], 
[~jjkoshy] please provide your thoughts. I have a patch ready which I will post 
if you guys think this is indeed the correct approach.


was (Author: singhashish):
Hey Guys,

I spent some time reproducing the issue and finding the root cause. Turns out 
KAFKA-1367 is not the issue here. Below is the problem and my suggested 
solution.

Problem: Alive brokers list not being propagated to brokers by coordinator. 
When a broker is started, it writes to ZK brokers path. Coordinator watches 
that path and notices the new broker. On noticing a new broker, the coordinator 
sends the UpdateMetadataRequest to only the new broker that just started up. 
The other brokers in cluster never gets to know that there are new brokers in 
the cluster.

Effect of KAFKA-1367: After KAFKA-1367 goes in it correct alive brokers 
information will be propagated to all live brokers after ISR changes at any 
broker. However, if there are no topics/ partitions KAFKA-1367 will not help 
and this issue will still be there.

Solution: Instead of sending the UpdateMetadataRequest only to new broker, send 
it to all live brokers in the cluster.

[~junrao], [~nehanarkhede], [~granthenke], [~gwenshap], [~charmalloc], 
[~jjkoshy] please provide your thoughts. I have a patch ready which I will post 
if you guys think this is indeed the correct approach.

 MetadataRequest returns stale list of brokers
 -

 Key: KAFKA-972
 URL: https://issues.apache.org/jira/browse/KAFKA-972
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0
Reporter: Vinicius Carvalho
Assignee: Ashish K Singh
 Attachments: BrokerMetadataTest.scala


 When we issue an metadatarequest towards the cluster, the list of brokers is 
 stale. I mean, even when a broker is down, it's returned back to the client. 
 The following are examples of two invocations one with both brokers online 
 and the second with a broker down:
 {
 brokers: [
 {
 nodeId: 0,
 host: 10.139.245.106,
 port: 9092,
 byteLength: 24
 },
 {
 nodeId: 1,
 host: localhost,
 port: 9093,
 byteLength: 19
 }
 ],
 topicMetadata: [
 {
 topicErrorCode: 0,
 topicName: foozbar,
 partitions: [
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 0,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 ],
 isr: [
 1
 ],
 partitionErrorCode: 0,
 partitionId: 1,
 leader: 1,
 byteLength: 26
 },
 {
 replicas: [
 0
 ],
 isr: [
 0
 ],
 partitionErrorCode: 0,
 partitionId: 2,
 leader: 0,
 byteLength: 26
 },
 {
 replicas: [
 1
 ],
 isr: [