[ 
https://issues.apache.org/jira/browse/MESOS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522259#comment-14522259
 ] 

Raul Gutierrez Segales commented on MESOS-2681:
-----------------------------------------------

Yeah, if you are getting a new zk handle after 10s via a zookeeper_init() call, 
that would trigger a DNS lookup.

I think we saw this in prod, but it might have been due to some dns servers not 
being up to date.

> Slave process must restart to update ensemble members
> -----------------------------------------------------
>
>                 Key: MESOS-2681
>                 URL: https://issues.apache.org/jira/browse/MESOS-2681
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>            Reporter: Joe Smith
>
> Right now, if a ZooKeeper ensemble has (for instance) more observers added to 
> it, the Mesos Slaves will not see them, and continue to attempt to connect to 
> only the original members. A restart of the slave process is required to call 
> {{getaddrinfo}} again and enumerate the list of hosts in the ensemble.
> Subsequent {{getaddrinfo}} calls _will only_ occur when {{zookeeper_init()}} 
> is called again, that is to say: when the old session expires and you need to 
> create a new one. If you swap all hosts in your ensemble too fast, without 
> permitting time for old sessions to expire, you'd end up with clients looping 
> forever, trying to connect to the old servers in order to get its old session 
> expired.
> This is best tracked by ZOOKEEPER-1998, where these is some discussion about 
> a necessary improvement to the implementation already in the 3.5.x branch, or 
> putting this functionality (debatably a feature vs. fixing a bug) in 3.4.x.
> (Thanks to [~rgs] for reviewing this as well)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to