> On Sept. 15, 2015, 10:32 p.m., Vinod Kone wrote:
> > src/tests/master_tests.cpp, line 3637
> > <https://reviews.apache.org/r/38003/diff/2/?file=1064545#file1064545line3637>
> >
> >     Does this test reliably fail (i.e., every time) without the code change 
> > in master.cpp?

Nop; the repro rate is about 90% (9 in 10 times). The root cause is master host 
re-used port; but if master did not re-use port, this issue will not trigger. 
For example, I can not reproduce this issue in Ubuntu 14.04 by default setting; 
but it's easy repro in OS X.


> On Sept. 15, 2015, 10:32 p.m., Vinod Kone wrote:
> > src/tests/master_tests.cpp, line 3636
> > <https://reviews.apache.org/r/38003/diff/2/?file=1064545#file1064545line3636>
> >
> >     Also add a CHECK_NE() check with both the slave ids?

No sure whether it's necessary; if duplicated slave ids in master, master will 
ask the second slave (with the same id) to shutdown; in this case, it will 
failed when waiting for re-register message.


> On Sept. 15, 2015, 10:32 p.m., Vinod Kone wrote:
> > src/tests/master_tests.cpp, lines 3607-3608
> > <https://reviews.apache.org/r/38003/diff/2/?file=1064545#file1064545line3607>
> >
> >     Why specify a mock executor and test containerizer? There's a 
> > StartSlave() overload that takes just the detector (and optionally flags), 
> > which you can use?

Yes, it's only for detector; let me try to use StartSlave with detector only.


- Klaus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38003/#review99112
-----------------------------------------------------------


On Sept. 14, 2015, 6:08 p.m., Klaus Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38003/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2015, 6:08 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-3351
>     https://issues.apache.org/jira/browse/MESOS-3351
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> __Phenomenon:__
> In some race condition, the slave was shutdown when after master failover.
> 
> __Root Cause:__
> The slave was shutdown because of duplicated SlavID: in master, the SlaveID 
> is genereated by masterInfo.id + "-S" + nextSlaveId; when master failover, 
> nextSlaveId was reset to 0 and masterInfo.id (generated by date + ip + port + 
> pid) maybe un-changed which lead to duplicated SlaveID. 
> 
> __Solution/Fix:__
> Generate masterInfo.id by UUID instead of "date + ip + port + pid".
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 5589eca 
>   src/tests/master_tests.cpp 8a6b98b 
> 
> Diff: https://reviews.apache.org/r/38003/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
> 
> 
> Thanks,
> 
> Klaus Ma
> 
>

Reply via email to