> On Jan. 23, 2015, 6:44 p.m., Jie Yu wrote:
> > src/slave/state.hpp, line 120
> > <https://reviews.apache.org/r/29918/diff/9/?file=829596#file829596line120>
> >
> >     One more thought:
> >     
> >     Can we introduce a third parameter for this function to let users opt 
> > out the temp + rename (we can use a default value)?
> >     
> >     This function is super critical. What if /tmp has too many links? It'll 
> > cause slave to flap. For majority of the checkpointing work, we don't need 
> > this temp + rename because we know the old file does not exist.
> >     
> >     What do you think?
> 
> Michael Park wrote:
>     But if we want to guarantee that the state of the checkpointed file is 
> always valid, I don't think we can get away from writing to temp + renaming.
>     For the case where the old file does not exist, we'd do something like, 
> write directly to the file if it doesn't already exist, and delete on error?
>     We would end up with an invalid checkpoint file if we crash while 
> writing, or run into an error and crash before deleting the file.
>     
>     I'm also not sure if /tmp having too many links is the level of problem 
> we need to worry about? It seems like a very unlikely case to me, but I'm not 
> sure.
>     
>     Just my thoughts.

Looks like it causes this bug: https://issues.apache.org/jira/browse/MESOS-2319


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29918/#review69435
-----------------------------------------------------------


On Jan. 24, 2015, 1:21 a.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29918/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2015, 1:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, and 
> Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Introduced a generic checkpoint function.
> 
> 
> Diffs
> -----
> 
>   src/slave/state.hpp 70777cf6ab681c29ca4df601fe47903e1dbdf41f 
>   src/slave/state.cpp a36fa53099300ee03f051b0f5eaaafe9f1da68d1 
>   src/tests/slave_recovery_tests.cpp 809822e63b05a21418cd9297c927d656d6fd871d 
> 
> Diff: https://reviews.apache.org/r/29918/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Michael Park
> 
>

Reply via email to