[jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166531#comment-15166531
 ] 

haosdent commented on MESOS-4677:
-

Nice analyzation!

{quote}
cgroups.procs doesn't change since exec doesn't change the PID. But there may 
be a race between updating the "threads" (cgroup/tasks) and us reading the 
cgroup/tasks.
{quote}
I think cgroup/tasks value always same as cgroup/cgroup.procs here before 
because we only have "cat". According to your analyzation, cgroup/tasks also 
would change here, right?

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163417#comment-15163417
 ] 

Joseph Wu commented on MESOS-4677:
--

My guess is this:
# The first {{usage = isolator.get()->usage(containerId);}} comes right after 
we isolate the test process, by writing to {{cgroup.procs}}.  Underneath, the 
cgroups API probably blocks the write from completing until the cgroups are 
updated.
# We do an {{os::close}} on a parent pipe to trigger the test process into 
{{exec}} ing.
# We immediately call {{usage = isolator.get()->usage(containerId);}} again.
# {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID.  But 
there may be a race between updating the "threads" ({{cgroup/tasks}}) and us 
reading the {{cgroup/tasks}}.

We can either:
* Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}} 
to synchronize.
* Add a sleep between closing the parent pipe and calling {{->usage(...)}}.
* Do some sort of operation on the test process (which would confirm that it is 
finished {{exec}} ing).  In this case we can write to the {{cat}} test process 
and read the echoed result.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159861#comment-15159861
 ] 

Joseph Wu commented on MESOS-4677:
--

Managed to get some verbose logs + failure:
{code}
[ RUN  ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
I0223 18:50:51.567102  4178 linux_launcher.cpp:101] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0223 18:50:51.567337  4178 resources.cpp:576] Parsing resources as JSON 
failed: cpus:0.5;mem:512
Trying semicolon-delimited string format instead
I0223 18:50:51.573931  4192 cpushare.cpp:389] Updated 'cpu.shares' to 512 (cpus 
0.5) for container d1e8db7a-d46e-42a9-9d4c-155edcf17f33
I0223 18:50:51.584337  4178 linux_launcher.cpp:304] Cloning child process with 
flags = 
../../src/tests/containerizer/isolator_tests.cpp:807: Failure
Value of: usage.get().threads()
  Actual: 0
Expected: 1U
Which is: 1
I0223 18:50:51.614691  4193 cgroups.cpp:2427] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/d1e8db7a-d46e-42a9-9d4c-155edcf17f33
I0223 18:50:51.619819  4198 cgroups.cpp:1409] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/d1e8db7a-d46e-42a9-9d4c-155edcf17f33 after 
5.063936ms
I0223 18:50:51.625041  4199 cgroups.cpp:2445] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/d1e8db7a-d46e-42a9-9d4c-155edcf17f33
I0223 18:50:51.630051  4199 cgroups.cpp:1438] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos/d1e8db7a-d46e-42a9-9d4c-155edcf17f33 after 
4.97024ms
[  FAILED  ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (183 ms)
{code}
The only difference between this and a successful run is the failed expectation.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> ---
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.27
>Reporter: Bernd Mathiske
>  Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)