[
https://issues.apache.org/jira/browse/ZOOKEEPER-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-5014:
--------------------------------------
Labels: pull-request-available (was: )
> Cache resolved IP addresses as fallback for DNS server failures
> ---------------------------------------------------------------
>
> Key: ZOOKEEPER-5014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5014
> Project: ZooKeeper
> Issue Type: Improvement
> Components: java client
> Affects Versions: 3.9.4
> Reporter: hoyong.eom
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Background
> When a ZooKeeper client needs to reconnect, it resolves hostnames via DNS.
> If the DNS server is temporarily unavailable, the client cannot resolve
> hostnames and fails to reconnect - even if the ZooKeeper servers are healthy
> and IP addresses haven't changed.
> *Note:* This is different from
> [ZOOKEEPER-4921|https://issues.apache.org/jira/browse/ZOOKEEPER-4921]
> (network failure reconnection bug in 3.9.3, fixed in 3.9.4).
> This proposal addresses DNS server outages specifically.
> h3. Problem
> * Client has successfully connected before (DNS was working)
> * DNS server becomes temporarily unavailable (maintenance, restart, network
> issue)
> * Client tries to reconnect but fails DNS resolution
> * Connection fails even though ZooKeeper server IP hasn't changed
> h3. Proposal
> Cache successfully resolved IP addresses and use them as fallback when DNS
> resolution fails.
> * New option: \{{zookeeper.client.dnsFallback.enabled}} (default: \{{false}})
> * On successful DNS resolution: cache the IP address
> * On DNS failure + option enabled: use cached IP as fallback
> * Backward compatible (disabled by default)
> h3. Use Cases
> * On-premise environments with unstable DNS infrastructure
> * Environments where server IP addresses rarely change
> * Temporary DNS outages (maintenance windows, DNS server restarts)
> h3. Implementation
> *ZKClientConfig.java:*
> {code:java}
> public static final String ZOOKEEPER_DNS_FALLBACK_ENABLED =
> "zookeeper.client.dnsFallback.enabled";
> public static final boolean ZOOKEEPER_DNS_FALLBACK_ENABLED_DEFAULT = false;
> public boolean isDnsFallbackEnabled() {
> return getBoolean(ZOOKEEPER_DNS_FALLBACK_ENABLED,
> ZOOKEEPER_DNS_FALLBACK_ENABLED_DEFAULT);
> }
> {code}
> *StaticHostProvider.java:*
> {code:java}
> private final Map<String, InetAddress> resolvedAddressCache = new
> ConcurrentHashMap<>();
> private InetSocketAddress resolve(InetSocketAddress address) {
> String hostString = address.getHostString();
> try {
> InetAddress resolved = resolver.getAllByName(hostString)[0];
> // Cache on success
> resolvedAddressCache.put(hostString, resolved);
> return new InetSocketAddress(resolved, address.getPort());
> } catch (UnknownHostException e) {
> // Fallback to cached IP if enabled
> if (clientConfig.isDnsFallbackEnabled()) {
> InetAddress cached = resolvedAddressCache.get(hostString);
> if (cached != null) {
> LOG.warn("DNS failed for {}, using cached IP {}", hostString,
> cached);
> return new InetSocketAddress(cached, address.getPort());
> }
> }
> return address;
> }
> }
> {code}
> h3. Usage
> {code}
> zookeeper.client.dnsFallback.enabled=true
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)