[ https://issues.apache.org/jira/browse/KAFKA-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872476#comment-16872476 ]
Sönke Liebau commented on KAFKA-5115: ------------------------------------- Hi [~MiniMizer], we've just discussed this today and while the change itself would be fairly simple, I believe there are a lot of areas that would need investigating / testing before this could be recommended for a production deployment. Specifically everything around transactions and idempotent producers seem to me to be worth a dedicated look. On the consumer side, the immediate concern I think is offsets, stored offsets might not create issues (but may also not work) - but anything cached inside the Fetcher cause havoc.. Bottom line: it is a good idea that I'd fully support, but probably needs more work than is immediately apparent. > Use bootstrap.servers to refresh metadata > ----------------------------------------- > > Key: KAFKA-5115 > URL: https://issues.apache.org/jira/browse/KAFKA-5115 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.10.2.0 > Reporter: Dan > Priority: Major > > Currently, it seems that bootstrap.servers list is used only when the > producer starts, to discover the cluster, and subsequent metadata refreshes > go to the discovered brokers directly. > We would like to use the bootstrap.servers list for metadata refresh to > support a failover mechanism by providing a VIP which can dynamically > redirect requests to a secondary Kafka cluster if the primary is down. > Consider the following use case, where "kafka-cluster.local" is a VIP on a > load balancer with priority server pools that point to two different Kafka > clusters (so when all servers of cluster #1 are down, it automatically > redirects to servers from cluster #2). > bootstrap.servers: kafka-cluster.local:9092 > 1) Producer starts, connects to kafka-cluster.local and discovers all servers > from cluster #1 > 2) Producer starts producing to cluster #1 > 3) cluster #1 goes down > 4) Producer detects the failure, refreshes metadata from kafka-cluster.local > (which now returns nodes from cluster #2) > 5) Producer starts producing to cluster #2 > 6) cluster #1 is brought back online, and kafka-cluster.local now points to > it again > In the current state, it seems that the producer will never revert to cluster > #1 because it continues to refresh its metadata from the brokers of cluster > #2, even though kafka-cluster.local no longer points to that cluster. > If we could force the metadata refresh to happen against > "kafka-cluster.local", it would enable automatic failover and failback > between the clusters. -- This message was sent by Atlassian JIRA (v7.6.3#76005)