hoyong.eom created ZOOKEEPER-5014:
-------------------------------------
Summary: Cache resolved IP addresses as fallback for DNS server
failures
Key: ZOOKEEPER-5014
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5014
Project: ZooKeeper
Issue Type: Improvement
Components: java client
Affects Versions: 3.9.4
Reporter: hoyong.eom
h2. Problem
When DNS server is temporarily unavailable, ZooKeeper client cannot reconnect
to the ZooKeeper server even if:
- The client was previously connected successfully
- The ZooKeeper server is still running and healthy
- Only the DNS server is down
h2. Current Behavior
In \{{StaticHostProvider.resolve()}}, when DNS resolution fails:
{code:java}
} catch (UnknownHostException e) {
LOG.error("Unable to resolve address: {}", address.toString(), e);
return address; // Returns unresolved address
}
{code}
The client returns an unresolved \{{InetSocketAddress}}, which causes
connection failures.
h2. Test Results
|| Test Case || Result || Time ||
| localhost:2181 | Connected | 164ms |
| non-existent-host.invalid:2181 | Failed | 10,005ms (timeout) |
Exception chain:
{code}
UnknownHostException → IllegalArgumentException → ConnectionLossException
{code}
Tested with Zookeeper client 3.7.2 / 3.9.4, Curator 5.6.0 / 5.7.1, Java 21.
h2. Proposal
Cache the last successfully resolved IP address and use it as fallback when DNS
resolution fails.
{code:java}
private final Map<String, InetAddress> resolvedAddressCache = new
ConcurrentHashMap<>();
private InetSocketAddress resolve(InetSocketAddress address) {
String hostname = address.getHostString();
try {
InetAddress resolved = resolver.getAllByName(hostname)[0];
// Cache on success
resolvedAddressCache.put(hostname, resolved);
return new InetSocketAddress(resolved, address.getPort());
} catch (UnknownHostException e) {
// Fallback to cached address
if (clientConfig.isDnsFallbackEnabled()) {
InetAddress cached = resolvedAddressCache.get(hostname);
if (cached != null) {
LOG.warn("DNS failed for {}, using cached address: {}",
hostname, cached);
return new InetSocketAddress(cached, address.getPort());
}
}
LOG.error("Unable to resolve address: {}", address.toString(), e);
return address;
}
}
{code}
h2. Design Considerations
* *Disabled by default*: New property
\{{zookeeper.client.dnsFallback.enabled=false}}
* *Backward compatible*: Existing behavior unchanged unless explicitly enabled
* *Complements existing work*: Does not conflict with ZOOKEEPER-2184
(re-resolve on connection failure)
h2. Use Case
This is useful in cloud/container environments where:
* DNS server may have temporary failures
* ZooKeeper server IP remains stable
* Client should maintain connection resilience
h2. Related Issues
* ZOOKEEPER-2184: Re-resolve hosts when connection fails
* ZOOKEEPER-1506: Re-try DNS resolution if node connection fails
* CURATOR-229: No retry on DNS lookup failure
I'm happy to submit a PR if this approach is acceptable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)