[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998686#comment-14998686
 ] 

haosdent commented on MESOS-3870:
---------------------------------

I think ProcessManager could dequeue same Process in different work thread?
{noformat}
ProcessBase* ProcessManager::dequeue()
{
  // TODO(benh): Remove a process from this thread's runq. If there
  // are no processes to run, and this is not a dedicated thread, then
  // steal one from another threads runq.

  ProcessBase* process = NULL;

  synchronized (runq_mutex) {
    if (!runq.empty()) {
      process = runq.front();
      runq.pop_front();
      // Increment the running count of processes in order to support
      // the Clock::settle() operation (this must be done atomically
      // with removing the process from the runq).
      running.fetch_add(1);
    }
  }

  return process;
}
{noformat}

> Prevent out-of-order libprocess message delivery
> ------------------------------------------------
>
>                 Key: MESOS-3870
>                 URL: https://issues.apache.org/jira/browse/MESOS-3870
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>            Reporter: Neil Conway
>            Priority: Minor
>              Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends <M1,M2> to P2, P2 might see <>, <M1>, <M2>, 
> or <M1,M2> — but not <M2,M1>.
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to