> On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote: > > src/master/master.cpp > > Lines 7211-7218 (patched) > > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7211> > > > > i feel this is unnecessary. It should be the responsibility of the > > agent to convert the operation to post refinement format if it is every > > downgraded before checkpointing. > > > > We can add some CHECK here.
I removed the explicit conversion and the check for now. > On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote: > > src/master/master.cpp > > Lines 7231-7244 (patched) > > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7231> > > > > Hum, what if this is a failover master and the UpdateSlaveMessage is a > > result of an LRP re-registration with the agent. In that case, master does > > not know about the operation but the agent does? > > > > Also, i would be future proof and does not check the latest state of > > the operation. It's possible that the operaiton is terminal but still > > waiting for ack from the framework, and master will still maintain that > > operation. I would still check the invariant that if the master knows about > > the RP, then it should know more operation than the agent. > > > > There might be one edge case worth documenting: master receives an ack > > from the framework, and removed the operation and send an Ack message to > > the agent. The agent didn't receive the ack yet before it sends an > > UpdateSlaveMessage. In that case, what we should do? Adding back the > > operation does not sound right because the agent will then delete the > > operation on receiving ack. Then we have an inconsistency between master > > and agent. To deal with master failover, I guarded this code behind a heuristic to detect whether this resource provider is known. I added a comment explaining why we only check non-terminal offer operations here. I also added some more documentation to the path removing unknown offer operations from master state below. > On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote: > > src/master/master.cpp > > Lines 7302-7308 (patched) > > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7302> > > > > We do want to support that when a CSI plugin detects a chagne to the > > total resources. I removed the complete else branch here and updated the code dealing with "known" resource providers below (with special treatment for resource providers changing from or to zero capacity). - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63732/#review191911 ----------------------------------------------------------- On Nov. 28, 2017, 10:32 p.m., Benjamin Bannier wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63732/ > ----------------------------------------------------------- > > (Updated Nov. 28, 2017, 10:32 p.m.) > > > Review request for mesos, Jie Yu and Jan Schlicht. > > > Bugs: MESOS-8207 > https://issues.apache.org/jira/browse/MESOS-8207 > > > Repository: mesos > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/master/master.cpp 700e12433b0b66efc3f5dd296711c0f203a13144 > > > Diff: https://reviews.apache.org/r/63732/diff/8/ > > > Testing > ------- > > `make check`, tested as part of https://reviews.apache.org/r/63843/. > > This patch requires `protobuf::isSpeculativeOperation` from > https://reviews.apache.org/r/63751/ which is _not_ part of this review chain > (to avoid multiple dependent RRs). > > > Thanks, > > Benjamin Bannier > >