[ https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163417#comment-15163417 ]
Joseph Wu commented on MESOS-4677: ---------------------------------- My guess is this: # The first {{usage = isolator.get()->usage(containerId);}} comes right after we isolate the test process, by writing to {{cgroup.procs}}. Underneath, the cgroups API probably blocks the write from completing until the cgroups are updated. # We do an {{os::close}} on a parent pipe to trigger the test process into {{exec}} ing. # We immediately call {{usage = isolator.get()->usage(containerId);}} again. # {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID. But there may be a race between updating the "threads" ({{cgroup/tasks}}) and us reading the {{cgroup/tasks}}. We can either: * Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}} to synchronize. * Add a sleep between closing the parent pipe and calling {{->usage(...)}}. * Do some sort of operation on the test process (which would confirm that it is finished {{exec}} ing). In this case we can write to the {{cat}} test process and read the echoed result. > LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky. > ----------------------------------------------------------- > > Key: MESOS-4677 > URL: https://issues.apache.org/jira/browse/MESOS-4677 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 0.27 > Reporter: Bernd Mathiske > Labels: flaky, test > > This test fails very often when run on CentOS 7, but may also fail elsewhere > sometimes. Unfortunately, it tends to only fail when --verbose is not set. > The output is this: > {noformat} > [21:45:21][Step 8/8] [ RUN ] > LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids > [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: > Failure > [21:45:21][Step 8/8] Value of: usage.get().threads() > [21:45:21][Step 8/8] Actual: 0 > [21:45:21][Step 8/8] Expected: 1U > [21:45:21][Step 8/8] Which is: 1 > [21:45:21][Step 8/8] [ FAILED ] > LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)