[ 
https://issues.apache.org/jira/browse/KAFKA-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-13653:
------------------------------
    Description: 
Currently, client metadata update has 2 situations:
 # partition leader change
 # metadata expired (default 5 mins)

But sometimes, we will start the client and the brokers at the same time. The 
client might discover only partial of the brokers at first. And when the 
discovered brokers down accidentally within 5 mins (before metadata expired), 
there would be no chance to update the metadata and all cluster is down.

Ex:

1. brokerA is up

2. producer is up, discovered brokerA, update its metadata

3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance 
check is not expired (5 mins default))

4. producer keeps producing data without error

5. brokerA down, let's say, in 3 mins after producer started

6. Now, all cluster won't work even though brokerB and brokerC are up

 

We should proactively discover active brokers when there are no nodes to 
connect via the bootstrap server config. So, in the above example, if the 
bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should 
be able to discover the brokerB and brokerC after step 6.

  was:
Currently, metadata update has 2 ways:
 # partition leader change
 # metadata expired (default 5 mins)

But sometimes, we will start the client and the brokers at the same time. The 
client might discover only partial of the brokers at first. And when the 
discovered brokers down accidentally within 5 mins (before metadata expired), 
there would be no chance to update the metadata and all cluster is down.

Ex:

1. brokerA is up

2. producer is up, discovered brokerA, update its metadata

3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance 
check is not expired (5 mins default))

4. producer keeps producing data without error

5. brokerA down, let's say, in 3 mins after producer started

6. Now, all cluster won't work even though brokerB and brokerC are up

 

We should proactively discover active brokers when there are no nodes to 
connect via the bootstrap server config. So, in the above example, if the 
bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should 
be able to discover the brokerB and brokerC after step 6.


> Proactively discover alive brokers from bootstrap server lists when all nodes 
> are down
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13653
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13653
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 3.1.0
>            Reporter: Luke Chen
>            Priority: Major
>
> Currently, client metadata update has 2 situations:
>  # partition leader change
>  # metadata expired (default 5 mins)
> But sometimes, we will start the client and the brokers at the same time. The 
> client might discover only partial of the brokers at first. And when the 
> discovered brokers down accidentally within 5 mins (before metadata expired), 
> there would be no chance to update the metadata and all cluster is down.
> Ex:
> 1. brokerA is up
> 2. producer is up, discovered brokerA, update its metadata
> 3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance 
> check is not expired (5 mins default))
> 4. producer keeps producing data without error
> 5. brokerA down, let's say, in 3 mins after producer started
> 6. Now, all cluster won't work even though brokerB and brokerC are up
>  
> We should proactively discover active brokers when there are no nodes to 
> connect via the bootstrap server config. So, in the above example, if the 
> bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should 
> be able to discover the brokerB and brokerC after step 6.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to