-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14686/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Repository: mesos-git


Description
-------

This is the second patch of a series of patches that implement a catch-up 
mechanism for replicated log. This patch fixed a race condition caused by 
passing process wrapper pointers (e.g. Network, Replica, etc.).

In our code base, one commonly used routine is to use a wrapper class to wrap a 
libprocess so that we don't need to worry about whether we should directly call 
the function or should use a dispatch. For example:

class FooProcess : public Process<FooProcess> {
  int foo();
};

class Foo {
  Future<int> foo() { return dispatch(process, &FooProcess::foo); }
  ...
  FooProcess* process;
};

If multiple entities want to share a libprocess (e.g. FooProcess), currently, 
we pass a pointer of its wrapper (e.g. Foo*) to the users. However, this is 
problematic in some cases. The main reason is that we don't know when it is 
safe to destroy the libprocess (FooProcess) and delete the wrapper (Foo) if a 
pointer of this wrapper has been passed to some other entities in the system 
and we don't know whether they will use it or not later.

There is a race condition in our first patch. Here is the buggy interleaving:

T1                                   T2

appending.discard();
                                     ExplicitPromiseProcess::initialize() {
onDiscarded() {
  terminate ExplicitPromiseProcess
}

delete network;
                      
                                       ...
                                       network->broadcast(protocol::promise, 
request)
                                     }

(We added a new test CoordinatorTest.FillTimeout which can reproduce this bug 
and will segfault without this patch.)


To fix this problem, we introduce a concept called ProcessHandle to generalize 
process wrappers (see process.hpp). It internally maintains a shared pointer to 
the underlying libprocess. It is copyable. Multiple copies of the same handle 
share the same underlying libprocess. When the last copy is deleted, we will 
terminate the underlying process and delete the libprocess. If a user wants to 
early terminate the libprocess, he can explicitly call the terminate() function.

Now, creating a wrapper for a libprocess is simply as that:

class Foo : public ProcessHandle<FooProcess> { ... };

This change won't break the existing code (i.e., if you use the old way of 
passing wrapper pointers, this change is like a noop).

This is a joint work with Yan Xu.


Diffs
-----

  3rdparty/libprocess/include/process/process.hpp b502324 
  src/Makefile.am a2d8242 
  src/log/catchup.hpp PRE-CREATION 
  src/log/catchup.cpp PRE-CREATION 
  src/log/consensus.hpp PRE-CREATION 
  src/log/consensus.cpp PRE-CREATION 
  src/log/coordinator.hpp 3f6fb7c 
  src/log/coordinator.cpp 6e6466f 
  src/log/log.hpp 77edc7a 
  src/log/network.hpp d34cf78 
  src/log/replica.hpp d1f5ead 
  src/log/replica.cpp 59a6ff3 
  src/log/zookeeper.hpp PRE-CREATION 
  src/log/zookeeper.cpp PRE-CREATION 
  src/tests/log_tests.cpp ff5f86c 

Diff: https://reviews.apache.org/r/14686/diff/


Testing
-------

bin/mesos-tests.sh --gtest_filter=CoordinatorTest*:ReplicaTest*:LogTest* 
--gtest_repeat=100


Thanks,

Jie Yu

Reply via email to