kehuum opened a new pull request #11621: URL: https://github.com/apache/kafka/pull/11621
[LI-HOTFIX] Resolve the bootstrap server when cluster metadata hasn't been refreshed for a long time This patch adds a config li.client.cluster.metadata.expire.time.ms which controls the max time cluster metadata can remain unchanged. On NetworkClient.poll, if this timeout has been reached and the client has tried half of the nodes in the original cached node set and failed, it will try to resolve the bootstrap servers again and us e the newly resolved nodes to pick a leastLoadedNode to send updateMetadataRequest. This is to avoid following two scenarios: consumer has been idle for a long time, and whole cluster has been swapped. This case, all the cached nodes are invalid and resolve bootstrap is needed. consumer hasn't refreshed metadata for a long time and some brokers in the cluster had been moved to another cluster, and the client randomly picks up the moved broker to send md request and get a response for a different cluster. In this case, we simply reject the stale md response and resolve bootstrap when conditions are met. TICKET = LI_DESCRIPTION = LIKAFKA-40759, EXIT_CRITERIA = MANUAL this is not going to merged with upstream *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org