[ https://issues.apache.org/jira/browse/KAFKA-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Chen updated KAFKA-13653: ------------------------------ Description: Currently, client metadata update has 2 situations: # partition leader change # metadata expired (default 5 mins) But sometimes, we will start the client and the brokers at the same time. The client might discover only partial of the brokers at first. And when the discovered brokers down accidentally within 5 mins (before metadata expired), there would be no chance to update the metadata and all cluster is down. Ex: 1. brokerA is up 2. producer is up, discovered brokerA, update its metadata 3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance check is not expired (5 mins default)) 4. producer keeps producing data without error 5. brokerA down, let's say, in 3 mins after producer started 6. Now, all cluster won't work even though brokerB and brokerC are up We should proactively discover active brokers when there are no nodes to connect via the bootstrap server config. So, in the above example, if the bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should be able to discover the brokerB and brokerC after step 6. was: Currently, metadata update has 2 ways: # partition leader change # metadata expired (default 5 mins) But sometimes, we will start the client and the brokers at the same time. The client might discover only partial of the brokers at first. And when the discovered brokers down accidentally within 5 mins (before metadata expired), there would be no chance to update the metadata and all cluster is down. Ex: 1. brokerA is up 2. producer is up, discovered brokerA, update its metadata 3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance check is not expired (5 mins default)) 4. producer keeps producing data without error 5. brokerA down, let's say, in 3 mins after producer started 6. Now, all cluster won't work even though brokerB and brokerC are up We should proactively discover active brokers when there are no nodes to connect via the bootstrap server config. So, in the above example, if the bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should be able to discover the brokerB and brokerC after step 6. > Proactively discover alive brokers from bootstrap server lists when all nodes > are down > -------------------------------------------------------------------------------------- > > Key: KAFKA-13653 > URL: https://issues.apache.org/jira/browse/KAFKA-13653 > Project: Kafka > Issue Type: Improvement > Components: clients > Affects Versions: 3.1.0 > Reporter: Luke Chen > Priority: Major > > Currently, client metadata update has 2 situations: > # partition leader change > # metadata expired (default 5 mins) > But sometimes, we will start the client and the brokers at the same time. The > client might discover only partial of the brokers at first. And when the > discovered brokers down accidentally within 5 mins (before metadata expired), > there would be no chance to update the metadata and all cluster is down. > Ex: > 1. brokerA is up > 2. producer is up, discovered brokerA, update its metadata > 3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance > check is not expired (5 mins default)) > 4. producer keeps producing data without error > 5. brokerA down, let's say, in 3 mins after producer started > 6. Now, all cluster won't work even though brokerB and brokerC are up > > We should proactively discover active brokers when there are no nodes to > connect via the bootstrap server config. So, in the above example, if the > bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should > be able to discover the brokerB and brokerC after step 6. -- This message was sent by Atlassian Jira (v8.20.1#820001)