[ https://issues.apache.org/jira/browse/MESOS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522259#comment-14522259 ]
Raul Gutierrez Segales commented on MESOS-2681: ----------------------------------------------- Yeah, if you are getting a new zk handle after 10s via a zookeeper_init() call, that would trigger a DNS lookup. I think we saw this in prod, but it might have been due to some dns servers not being up to date. > Slave process must restart to update ensemble members > ----------------------------------------------------- > > Key: MESOS-2681 > URL: https://issues.apache.org/jira/browse/MESOS-2681 > Project: Mesos > Issue Type: Bug > Components: slave > Reporter: Joe Smith > > Right now, if a ZooKeeper ensemble has (for instance) more observers added to > it, the Mesos Slaves will not see them, and continue to attempt to connect to > only the original members. A restart of the slave process is required to call > {{getaddrinfo}} again and enumerate the list of hosts in the ensemble. > Subsequent {{getaddrinfo}} calls _will only_ occur when {{zookeeper_init()}} > is called again, that is to say: when the old session expires and you need to > create a new one. If you swap all hosts in your ensemble too fast, without > permitting time for old sessions to expire, you'd end up with clients looping > forever, trying to connect to the old servers in order to get its old session > expired. > This is best tracked by ZOOKEEPER-1998, where these is some discussion about > a necessary improvement to the implementation already in the 3.5.x branch, or > putting this functionality (debatably a feature vs. fixing a bug) in 3.4.x. > (Thanks to [~rgs] for reviewing this as well) -- This message was sent by Atlassian JIRA (v6.3.4#6332)