> On Dec. 22, 2017, 3:38 a.m., Greg Mann wrote:
> > This looks like a reasonable solution to me. However, it would be great if 
> > we could reproduce the bug and then verify the fix. Looking at the log of a 
> > failed test run in the JIRA, it seems to me that the problem occurs when 
> > cleanup of an orphaned container left over from a previous test is 
> > attempted by the agent destructor called during 
> > `LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags`. To attempt a 
> > repro, I would suggest the following:
> > 1) Peg the CPU on the machine so that libprocess takes a long time to 
> > process messages in its queue
> > 2) Run `LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags` and one 
> > other (fast-running) test which creates a container, setting 
> > '--gtest_repeat=-1'
> > 
> > Hopefully, this may recreate the circumstances which led to the failure in 
> > CI?

That's a good idea! I'm going to try to reproduce the bug by running multiple 
tests simultaneously.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64770/#review194394
-----------------------------------------------------------


On Dec. 21, 2017, 3:58 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64770/
> -----------------------------------------------------------
> 
> (Updated Dec. 21, 2017, 3:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-7506
>     https://issues.apache.org/jira/browse/MESOS-7506
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> There was a race condition leading to flaky
> `LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags` test.
> This test launches successively multiple agents, while reusing the same
> variable. After reassigning the value of the variable, agent's d'tor is
> called. If agent recovery is not yet completed, then some orphaned
> container might blink in the agent's d'tor as it is described in the
> comment to the code.
> 
> 
> Diffs
> -----
> 
>   src/tests/cluster.cpp f964bf0cd0cf22374877e5748ba142dcb5fee133 
> 
> 
> Diff: https://reviews.apache.org/r/64770/diff/4/
> 
> 
> Testing
> -------
> 
> sudo make check (fedora 25)
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to