Luke Chen created KAFKA-13653:
---------------------------------

             Summary: Proactively discover alive brokers from bootstrap server 
lists when all nodes are down
                 Key: KAFKA-13653
                 URL: https://issues.apache.org/jira/browse/KAFKA-13653
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 3.1.0
            Reporter: Luke Chen


Currently, metadata update has 2 ways:
 # partition leader change
 # metadata expired (default 5 mins)

But sometimes, we will start the client and the brokers at the same time. The 
client might discover only partial of the brokers at first. And when the 
discovered brokers down accidentally within 5 mins (before metadata expired), 
there would be no chance to update the metadata and all cluster is down.

Ex:

1. brokerA is up

2. producer is up, discovered brokerA, update its metadata

3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance 
check is not expired (5 mins default))

4. producer keeps producing data without error

5. brokerA down, let's say, in 3 mins after producer started

6. Now, all cluster won't work even though brokerB and brokerC are up

 

We should proactively discover active brokers when there are no nodes to 
connect via the bootstrap server config. So, in the above example, if the 
bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should 
be able to discover the brokerB and brokerC after step 6.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to