> On July 12, 2016, 5:29 p.m., Jiang Yan Xu wrote: > > src/slave/slave.cpp, line 4803 > > <https://reviews.apache.org/r/48313/diff/7/?file=1441668#file1441668line4803> > > > > "or no target resources are present": We are inside the > > > > ``` > > if (resourcesState.get().target.isSome()) { > > } > > ``` > > > > block, so we are certain that the target exists right? > > > > ``` > > CHECK(os::exists(paths::getResourcesTargetPath(metaDir))); > > ``` > > > > instead?
I fixed the comment but did not add the `CHECK()` since although it should never happen, crashing the agent does not seem necessary especially because we do a `LOG(ERROR)` if `os::rm()` fails on the target resources file. - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48313/#review141912 ----------------------------------------------------------- On July 11, 2016, 9:42 p.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48313/ > ----------------------------------------------------------- > > (Updated July 11, 2016, 9:42 p.m.) > > > Review request for mesos, Neil Conway and Jiang Yan Xu. > > > Bugs: MESOS-5448 > https://issues.apache.org/jira/browse/MESOS-5448 > > > Repository: mesos > > > Description > ------- > > Consistency in persistent volumes between master and agent on failure. > > When the agent receives CheckpointedResourcesMessage, we store the > target checkpoint on disk. On successful create and destroy of > persistent volumes as a part of handling this messages, we commit > the checkpoint on the disk, and clear the target checkpoint. > > However, incase of any failure we do not commit the checkpoint to > disk, and exit the agent. When the agent restarts and there is a > target checkpoint present on disk which differs from the committed > checkpoint, we retry to sync the target and committed checkpoint. > On success, we reregister the agent with the master, but in case it > fails, we do not commit the checkpoint and the agent exists. > > > Diffs > ----- > > src/slave/paths.hpp 339e539863c678b6ed4d4670d75c7ff4c54daa79 > src/slave/paths.cpp 03157f93b1e703006f95ef6d0a30afae375dcdb5 > src/slave/slave.hpp 42afa9e2ebe5cf8e35802c8d169f52879d6073ac > src/slave/slave.cpp 02982d542c9e6b5b5f7fc8b3c73db6f5bac01358 > src/slave/state.hpp 0de2a4ee4fabaad612c4526166157b001c380bdb > src/slave/state.cpp 9cec0868b1187ed3ccac7f065e8a21c2f52178d9 > > Diff: https://reviews.apache.org/r/48313/diff/ > > > Testing > ------- > > All tests passed. > > > Thanks, > > Anindya Sinha > >