[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

Michi Mutsuzaki (JIRA) Mon, 20 Apr 2015 15:51:14 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503871#comment-14503871
 ]


Michi Mutsuzaki commented on ZOOKEEPER-1506:
--------------------------------------------

Thanks Raul for testing this. I'd try replacing calls to getHostName to 
getHostString. For example, I found another one in QuorumCnxManager.java:

org/apache/zookeeper/server/quorum/QuorumCnxManager.java:            String 
addr = self.getElectionAddress().getHostName() + ":" + 
self.getElectionAddress().getPort();


> Re-try DNS hostname -> IP resolution if node connection fails
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1506
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Ubuntu 11.04 64-bit
>            Reporter: Mike Heffner
>            Assignee: Michi Mutsuzaki
>            Priority: Critical
>              Labels: patch
>             Fix For: 3.4.7, 3.5.1, 3.6.0
>
>         Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> zk-dns-caching-refresh.patch
>
>
> In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
> an ensemble. These hostnames are configured with a low (<= 60s) TTL and the 
> IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

Reply via email to