[ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-----------------------------
          Sprint: Mesosphere Sprint 37
    Story Points: 5

> Masters may drop the first message they send between masters after a network 
> partition
> --------------------------------------------------------------------------------------
>
>                 Key: MESOS-5576
>                 URL: https://issues.apache.org/jira/browse/MESOS-5576
>             Project: Mesos
>          Issue Type: Bug
>          Components: leader election, master, replicated log
>    Affects Versions: 0.28.2
>         Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>            Reporter: Joseph Wu
>              Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to