> On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote:
> > src/master/master.cpp
> > Lines 7211-7218 (patched)
> > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7211>
> >
> >     i feel this is unnecessary. It should be the responsibility of the 
> > agent to convert the operation to post refinement format if it is every 
> > downgraded before checkpointing.
> >     
> >     We can add some CHECK here.

I removed the explicit conversion and the check for now.


> On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote:
> > src/master/master.cpp
> > Lines 7231-7244 (patched)
> > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7231>
> >
> >     Hum, what if this is a failover master and the UpdateSlaveMessage is a 
> > result of an LRP re-registration with the agent. In that case, master does 
> > not know about the operation but the agent does?
> >     
> >     Also, i would be future proof and does not check the latest state of 
> > the operation. It's possible that the operaiton is terminal but still 
> > waiting for ack from the framework, and master will still maintain that 
> > operation. I would still check the invariant that if the master knows about 
> > the RP, then it should know more operation than the agent.
> >     
> >     There might be one edge case worth documenting: master receives an ack 
> > from the framework, and removed the operation and send an Ack message to 
> > the agent. The agent didn't receive the ack yet before it sends an 
> > UpdateSlaveMessage. In that case, what we should do? Adding back the 
> > operation does not sound right because the agent will then delete the 
> > operation on receiving ack. Then we have an inconsistency between master 
> > and agent.

To deal with master failover, I guarded this code behind a heuristic to detect 
whether this resource provider is known.

I added a comment explaining why we only check non-terminal offer operations 
here. I also added some more documentation to the path removing unknown offer 
operations from master state below.


> On Nov. 27, 2017, 9:03 p.m., Jie Yu wrote:
> > src/master/master.cpp
> > Lines 7302-7308 (patched)
> > <https://reviews.apache.org/r/63732/diff/7/?file=1901207#file1901207line7302>
> >
> >     We do want to support that when a CSI plugin detects a chagne to the 
> > total resources.

I removed the complete else branch here and updated the code dealing with 
"known" resource providers below (with special treatment for resource providers 
changing from or to zero capacity).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63732/#review191911
-----------------------------------------------------------


On Nov. 28, 2017, 10:32 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63732/
> -----------------------------------------------------------
> 
> (Updated Nov. 28, 2017, 10:32 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8207
>     https://issues.apache.org/jira/browse/MESOS-8207
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 700e12433b0b66efc3f5dd296711c0f203a13144 
> 
> 
> Diff: https://reviews.apache.org/r/63732/diff/8/
> 
> 
> Testing
> -------
> 
> `make check`, tested as part of https://reviews.apache.org/r/63843/.
> 
> This patch requires `protobuf::isSpeculativeOperation` from 
> https://reviews.apache.org/r/63751/ which is _not_ part of this review chain 
> (to avoid multiple dependent RRs).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Reply via email to