[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051248#comment-18051248
 ] 

Andor Molnar commented on ZOOKEEPER-5014:
-----------------------------------------

It's not ZooKeeper's responsibility to deal with DNS server outages. When the 
primary DNS server is temporarily unavailable, the secondary DNS server should 
handle the requests. In order to avoid too many requests sent to DNS servers, 
operating systems and the JVM itself caches DNS responses taking into account 
TTL attribute of DNS entries.

The implementation is not a real cache: it stores the value, but only re-uses 
it in case of an error.

I think detection of the error is also wrong: UnknownHostException also occurs 
in case of the DNS entry has been deleted, in which case we should throw an 
error rather than using an outdated IP address.

> Cache resolved IP addresses as fallback for DNS server failures
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-5014
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5014
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: java client
>    Affects Versions: 3.9.4
>            Reporter: hoyong.eom
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Background
> When a ZooKeeper client needs to reconnect, it resolves hostnames via DNS.
> If the DNS server is temporarily unavailable, the client cannot resolve 
> hostnames and fails to reconnect - even if the ZooKeeper servers are healthy 
> and IP addresses haven't changed.
> *Note:* This is different from 
> [ZOOKEEPER-4921|https://issues.apache.org/jira/browse/ZOOKEEPER-4921] 
> (network failure reconnection bug in 3.9.3, fixed in 3.9.4).
> This proposal addresses DNS server outages specifically.
> h3. Problem
> * Client has successfully connected before (DNS was working)
> * DNS server becomes temporarily unavailable (maintenance, restart, network 
> issue)
> * Client tries to reconnect but fails DNS resolution
> * Connection fails even though ZooKeeper server IP hasn't changed
> h3. Proposal
> Cache successfully resolved IP addresses and use them as fallback when DNS 
> resolution fails.
> * New option: \{{zookeeper.client.dnsFallback.enabled}} (default: \{{false}})
> * On successful DNS resolution: cache the IP address
> * On DNS failure + option enabled: use cached IP as fallback
> * Backward compatible (disabled by default)
> h3. Use Cases
> * On-premise environments with unstable DNS infrastructure
> * Environments where server IP addresses rarely change
> * Temporary DNS outages (maintenance windows, DNS server restarts)
> h3. Implementation
> *ZKClientConfig.java:*
> {code:java}
> public static final String ZOOKEEPER_DNS_FALLBACK_ENABLED = 
> "zookeeper.client.dnsFallback.enabled";
> public static final boolean ZOOKEEPER_DNS_FALLBACK_ENABLED_DEFAULT = false;
> public boolean isDnsFallbackEnabled() {
>     return getBoolean(ZOOKEEPER_DNS_FALLBACK_ENABLED, 
> ZOOKEEPER_DNS_FALLBACK_ENABLED_DEFAULT);
> }
> {code}
> *StaticHostProvider.java:*
> {code:java}
> private final Map<String, InetAddress> resolvedAddressCache = new 
> ConcurrentHashMap<>();
> private InetSocketAddress resolve(InetSocketAddress address) {
>     String hostString = address.getHostString();
>     try {
>         InetAddress resolved = resolver.getAllByName(hostString)[0];
>         // Cache on success
>         resolvedAddressCache.put(hostString, resolved);
>         return new InetSocketAddress(resolved, address.getPort());
>     } catch (UnknownHostException e) {
>         // Fallback to cached IP if enabled
>         if (clientConfig.isDnsFallbackEnabled()) {
>             InetAddress cached = resolvedAddressCache.get(hostString);
>             if (cached != null) {
>                 LOG.warn("DNS failed for {}, using cached IP {}", hostString, 
> cached);
>                 return new InetSocketAddress(cached, address.getPort());
>             }
>         }
>         return address;
>     }
> }
> {code}
> h3. Usage
> {code}
> zookeeper.client.dnsFallback.enabled=true
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to