----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18584/#review36447 -----------------------------------------------------------
src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67422> How do you feel about pulling the watch and broadcast apart: chain = watch() .then(defer(self(), &Self::broadcast)) .then(defer(self(), &Self::receive, lambda::_1)) .onAny(defer(self(), &Self::finished, lambda::_1)); Then we could make '_broadcast' only do the setting Future<Nothing> broadcasted() { responses = _responses; // Reset the counters. responsesReceived.clear(); lowestBeginPosition = None(); highestEndPosition = None(); return Nothing(); } This allows 'handle' to become 'receive' and '_handle' to become 'received': Future<Option<RecoverResponse> > receive() { // Instead of using a for loop here, we use select to process // responses one after another so that we can ignore the rest if // we have collected enough responses. return select(responses) .then(defer(self(), &Self::received, lambda::_1)); } src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67427> Is there a race between watch() and broadcast() that can cause us to select() on an empty set and get stuck? watch() -> returns size == quorum // Now ZK blip, 0 network members. broadcast() -> returns empty set select(empty_set) -> stuck pending? Is this safe? Is there a timeout built-in at a higher layer to bail if we get stuck? src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67393> How about: "Starting replica recovery" src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67394> Do we still need this now that finalize() is a no-op? src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67406> Do we need to have these helpers that implicitly set 'status' instead of just returning the Metadata::Status? It looks like we can remove the optimization in updateReplicaStatus and avoid the need to store a 'status' member. If the optimization is important, Replica::update(Metadata::Status) could avoid the write if there's no change, right? src/log/recover.cpp <https://reviews.apache.org/r/18584/#comment67408> Can you move discard up to the beginning of the private block? - Ben Mahler On March 6, 2014, 10:24 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18584/ > ----------------------------------------------------------- > > (Updated March 6, 2014, 10:24 p.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Bugs: MESOS-984 > https://issues.apache.org/jira/browse/MESOS-984 > > > Repository: mesos-git > > > Description > ------- > > Previously, we use onAny to connect most of the steps. As a result, we see a > lot of the error checking code like the following: > > void checked(const Future<Metadata::Status>& future) > { > if (!future.isReady()) { > promise.fail( > future.isFailed() ? > "Failed to get replica status: " + future.failure() : > "Not expecting discarded future"); > terminate(self()); > return; > } > ... > } > > Another drawback is: discarding becomes difficult and hard to reason. > > Now, I cleaned up the code to use the continuation style (i.e., chaining > using Future.then). The code becomes much more clear now! > > Also, I factored out the recover protocol part from the log recover process > to make the log recover process less cumbersome. > > This patch does not change any semantics, but is for MESOS-984 (log auto > initialization for registrar). Will follow up with a patch to integrate the > log auto initialization based on this patch. > > > Diffs > ----- > > src/log/recover.cpp e611a4e > > Diff: https://reviews.apache.org/r/18584/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >
