[ 
https://issues.apache.org/jira/browse/CURATOR-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218200#comment-16218200
 ] 

jugosag commented on CURATOR-229:
---------------------------------

I would like to second the request to fix this problem. While DNS outages might 
be rare in classical deployment scenarios, they are much more likely in 
Docker-based environments: Docker containers can be given DNS names, but this 
name is only resolvable once the containers is actually started. During startup 
of our stack, often not all Zookeeper containers of our cluster are started yet 
(because fixing a certain startup order is hard to do and an anti-pattern 
anyway), but some Zookeeper clients containing curator are already starting up, 
trying to connect to the ensemble, and failing due to UnknownHostException 
(which, as was already mentioned below, is not even thrown but a background 
exception, making it even more convoluted to do one's own retry loop).

So (maybe optionally)= making a DNS lookup error (UnknownHostException) a 
retryable error (not only during curator startup, but also during failover 
situations when Curator/ZookeeperClient switches from one Zookeeper instance 
that failed to another of the zookeeper connect string) would be really helpful 
here.


> No retry on DNS lookup failure
> ------------------------------
>
>                 Key: CURATOR-229
>                 URL: https://issues.apache.org/jira/browse/CURATOR-229
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.7.0
>            Reporter: Michael Putters
>
> Our environment is setup so that host names (rather than IP addresses) are 
> used when registering services.
> When disconnecting a node from the network, it will attempt to reconnect and 
> - in order to do this - attempts to resolve a host name, which fails (since 
> we have no network connectivity and a DNS server is used).
> It appears this type of exception is not retryable, and the node simply gives 
> up and never reconnects, even when the network connectivity is back.
> Is this the expected behavior? Is there any way to configure Curator so that 
> this type of exception is retryable? I had a look at 
> {{CuratorFrameworkImpl.java}} around line 768 but there doesn't seem to be 
> anything configurable.
> If this is not the expected behavior (or if it is but you don't mind making 
> it configurable), I should be able to provide a patch via a pull request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to