[ https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347587#comment-15347587 ]
Joseph Wu edited comment on MESOS-5576 at 6/29/16 2:27 AM: ----------------------------------------------------------- After a discussion with [~benjaminhindman], [~bmahler], and [~jieyu], we determined that {{unlink}} semantics are not adequate when the application level knows about a broken socket (while libprocess does not). Instead, the option to "relink" is preferable, as this should create a new persistent socket, without regards to how other processes are interacting inside libprocess. Review based on [MESOS-5740]: https://reviews.apache.org/r/49346/ was (Author: kaysoky): After a discussion with [~benjaminhindman], [~bmahler], and [~jieyu], we determined that {{unlink}} semantics are not adequate when the application level knows about a broken socket (while libprocess does not). Instead, the option to "relink" is preferable, as this should create a new persistent socket, without regards to how other processes are interacting inside libprocess. See: [MESOS-5740] > Masters may drop the first message they send between masters after a network > partition > -------------------------------------------------------------------------------------- > > Key: MESOS-5576 > URL: https://issues.apache.org/jira/browse/MESOS-5576 > Project: Mesos > Issue Type: Improvement > Components: leader election, master, replicated log > Affects Versions: 0.28.2 > Environment: Observed in an OpenStack environment where each master > lives on a separate VM. > Reporter: Joseph Wu > Assignee: Joseph Wu > Labels: mesosphere > > We observed the following situation in a cluster of five masters: > || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 || > | 0 | Follower | Follower | Follower | Follower | Leader | > | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster > by downing this VM's network || > | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost > leadership | > | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to > leader | Still down | > | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | > Still down | > | 5 | Leader | Follower | Follower | Follower | Still down | > | 6 | Leader | Follower | Follower | Follower | Comes back up | > | 7 | Leader | Follower | Follower | Follower | Follower | > | 8 || Partitioned in the same way as Master 5 | Follower | Follower | > Follower | Follower | > | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | > Follower | Follower | > | 10 | Still down | Performs consensus | Replies to leader | Replies to > leader || Doesn't get the message! || > | 11 | Still down | Performs writing | Acks to leader | Acks to leader || > Acks to leader || > | 12 | Still down | Leader | Follower | Follower | Follower | > Master 2 sends a series of messages to the recently-restarted Master 5. The > first message is dropped, but subsequent messages are not dropped. > This appears to be due to a stale link between the masters. Before leader > election, the replicated log actors create a network watcher, which adds > links to masters that join the ZK group: > https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159 > This link does not appear to break (Master 2 -> 5) when Master 5 goes down, > perhaps due to how the network partition was induced (in the hypervisor > layer, rather than in the VM itself). > When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not > observe the [expected log > message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494] > Instead, we see a log line in Master 2: > {code} > process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is > not connected > {code} > The broken link is removed by the libprocess {{socket_manager}} and the > following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new > socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)