----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25614/#review53440 -----------------------------------------------------------
Patch looks great! Reviews applied: [25614] All tests passed. - Mesos ReviewBot On Sept. 15, 2014, 8:54 p.m., Connor Doyle wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25614/ > ----------------------------------------------------------- > > (Updated Sept. 15, 2014, 8:54 p.m.) > > > Review request for mesos and Niklas Nielsen. > > > Bugs: MESOS-1795 > https://issues.apache.org/jira/browse/MESOS-1795 > > > Repository: mesos-git > > > Description > ------- > > ## OVERVIEW > > This change adds a measure of safety for calls to `Future<T>::await()` > without a timeout from within the JNI interface to the state > abstraction. > > ## MOTIVATION > > Observed the following log output prior to a crash of the Marathon scheduler: > > ``` > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: F0912 > 23:46:01.771927 11532 org_apache_mesos_state_AbstractState.cpp:145] > CHECK_READY(*future): is PENDING > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: *** Check failure > stack trace: *** > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febc2663a2d google::LogMessage::Fail() > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febc26657e3 google::LogMessage::SendToLog() > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febc2663648 google::LogMessage::Flush() > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febc266603e google::LogMessageFatal::~LogMessageFatal() > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febc26588a3 Java_org_apache_mesos_state_AbstractState__1_1fetch_1get > Sep 12 23:46:01 highly-available-457-540 marathon[11494]: @ > 0x7febcd107d98 (unknown) > ``` > _Listing 1: Crash log output._ > > ## CHANGES IN DETAIL > > The in-line comments in `Latch::await(const Duration& duration)`, mention > that it's possible for the call to return before the underlying process > has completed for two reasons: > > ``` > ... > process::wait(pid, duration); // Explict to disambiguate. > // It's possible that we failed to wait because: > // (1) Our process has already terminated. > // (2) We timed out (i.e., duration was not "infinite"). > ... > ``` > _Listing 2: Excerpt from 3rdparty/src/libprocess/latch.cpp._ > > Call sites within `org_apache_mesos_state_AbstractState.cpp` > that formerly discarded the return value now throw a > `java.concurrent.ExecutionException` if the await fails. I chose to throw > this instead of a `java.concurrent.TimeoutException` since the expectation > from the caller's perspective is to block forever pending a ready state. > > > Diffs > ----- > > src/java/jni/org_apache_mesos_state_AbstractState.cpp 1accc8a > > Diff: https://reviews.apache.org/r/25614/diff/ > > > Testing > ------- > > - make > - make check > > > Thanks, > > Connor Doyle > >