Hi Benjamin and all, I'd like to talk about MESOS-1806. Since I took this ticket from halfway, and there was no design doc for it, I have created one based on the current implementation.
https://docs.google.com/document/d/1ccY0XJoOODpIiGPllSVvl7t-YRrIEE_NavfbZHKPWBs/edit?usp=sharing Besides, there some details I'd like to discuss: 1. Etcd servers wound't accept requests from clients during the leader election phase. So when there is a leader re-election among the etcd servers, the request from the current master to renew the timestamp of the v2/keys/mesos node would fail, and the current code would immediately retry with the next server, which would refuse the request as well. Thus the master would exit due to all servers fail its requests. The same happens with slaves – detector would fail after requests to all the etcd servers are refused. To solve this, we should add logic to wait for a while before trying the next server. 2. If the the current master somehow fails to update the v2/keys/mesos node in time, that node would expire, the detector would detect this, commit suicide due to lost of leadership. This is correct behavior, but the current TTL is kind of small: only 5 seconds, and the current master is set to update the node at 80% of the TTL, i.e. the 4th second, so the chance of this problem is not that low, e.g. if there happens ephemeral network problem. This can be achieved by increase the TTL to 10 seconds, and let the current master try to update the etcd node at 60% of the TTL. 3. The current implementation requires the list of masters to be specified in the "--masters=..." flag (used in the replicated logs quorum), this makes it inconvenient to add new masters to the cluster: every existing master must be restarted with updated "--masters=" flag. What about create a directory in etcd key space, and let each master create a child node in that directory? Regards, Shuai
