-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63797/#review190972
-----------------------------------------------------------




src/slave/slave.cpp
Lines 3568-3574 (original), 3573-3578 (patched)
<https://reviews.apache.org/r/63797/#comment268564>

    Hum, what if the checkpoint of the `TargetPath` succeeded but the commit 
failed? Should we delete the `TargetPath` so that it'll not be retried after 
agent failover? What if the removal also fails?
    
    This is indeed quite tricky. The reason that the apple folks did the 
prepare+commit thing is to make sure master and agent are in sync in case of a 
failed old operation. That's exactly the problem we're trying to solve here. 
That makes me wondering if we still need this prepare+commit style 
checkpointing or not for the `ApplyOfferOperationMessage` path (this is 
guaranteed to be a new master).
    
    Also, we might need to checkpoint offer operations along with total 
resources atomically for agent default resources, that means we have to use a 
different checkpoint file for that.
    
    Based on that, my suggestion is that we don't touch the original 
`checkpointResources` method. Instead, let's introduce a new one that don't do 
this prepare+commit style checkpointing.
    
    Also, is this strictly required for our MVP. if not, I'd suggest we deal 
with that later.


- Jie Yu


On Nov. 14, 2017, 2:11 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63797/
> -----------------------------------------------------------
> 
> (Updated Nov. 14, 2017, 2:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Bugs: MESOS-8211
>     https://issues.apache.org/jira/browse/MESOS-8211
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With offer operation handling an agent can send feedback to the master
> when checkpointing fails.
> Old masters will still send a 'CheckpointResourcesMessage', a wrapper
> has been added that fails over the agent when checkpointing fails.
> As before this will result in an agent re-registration and
> reconciliation of resources.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp c0acaa639a2bacaa6955ae6c5ab41e75dc1d11f7 
>   src/slave/slave.cpp d8bacebc74790e955490a158c37ac0d9e75fd6b5 
> 
> 
> Diff: https://reviews.apache.org/r/63797/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>

Reply via email to