----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65759/#review198382 -----------------------------------------------------------
src/docker/executor.cpp Lines 242-247 (patched) <https://reviews.apache.org/r/65759/#comment278523> Doesn't this render the `onFailed` callback registered on L357 useless? i.e., the `inspect` future will never transition to the failed state? If so, we should probably either remove the `onFailed` callback, or do this instead: ``` if (!future.hasDiscard()) { return Break(future); } ``` I guess the question is: if the inspect call actually fails, rather than hanging, do we want to retry? It looks like there are several cases in `Docker::inspect` which will result in a failure (failed to create subprocess, failed to read stdout, etc..), and it looks to me like we could probably just retry in those cases. WDYT? src/docker/executor.cpp Lines 243 (patched) <https://reviews.apache.org/r/65759/#comment278522> Nit: indent two more spaces, to avoid identical indentation to the following line. - Greg Mann On Feb. 22, 2018, 8:32 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65759/ > ----------------------------------------------------------- > > (Updated Feb. 22, 2018, 8:32 p.m.) > > > Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and > Michael Park. > > > Bugs: MESOS-8574 > https://issues.apache.org/jira/browse/MESOS-8574 > > > Repository: mesos > > > Description > ------- > > This patch adds retries for `inspect` command to workaround docker > daemon hangs. We assume that the docker daemon can be temporarily > unresponsive. If it's unresponsive, then any started docker cli > command hangs. To address the issue, we retry `inspect` in the loop. > > > Diffs > ----- > > src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 > > > Diff: https://reviews.apache.org/r/65759/diff/3/ > > > Testing > ------- > > internal CI > > Manually, described in /r/65713 > > > Thanks, > > Andrei Budnik > >