[ 
https://issues.apache.org/jira/browse/MESOS-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376431#comment-14376431
 ] 

Vinod Kone commented on MESOS-2402:
-----------------------------------

As [~idownes] mentioned the flags have to be setup properly for exec to 
function correctly. But that fix has nothing to do with the flakiness.

The real issue seem to be that there is a race between the 
'containerizer->wait()' future being set to failed and the metric being 
updated. This is because the thread running the test might check for the metric 
value in between containerizer future being set but before the metric being 
updated (see 'MesosContainerizerProcess::__destroy()'). The fix is to settle 
the clock to ensure the metric is updated.

> MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky
> -------------------------------------------------------------
>
>                 Key: MESOS-2402
>                 URL: https://issues.apache.org/jira/browse/MESOS-2402
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>
> "Failed to os::execvpe in childMain". Never seen this one before.
> {code}
> [ RUN      ] MesosContainerizerDestroyTest.LauncherDestroyFailure
> Using temporary directory 
> '/tmp/MesosContainerizerDestroyTest_LauncherDestroyFailure_QpjQEn'
> I0224 18:55:49.326912 21391 containerizer.cpp:461] Starting container 
> 'test_container' for executor 'executor' of framework ''
> I0224 18:55:49.332252 21391 launcher.cpp:130] Forked child with pid '23496' 
> for container 'test_container'
> ABORT: (src/subprocess.cpp:165): Failed to os::execvpe in childMain
> *** Aborted at 1424832949 (unix time) try "date -d @1424832949" if you are 
> using GNU date ***
> PC: @     0x2b178c5db0d5 (unknown)
> I0224 18:55:49.340955 21392 process.cpp:2117] Dropped / Lost event for PID: 
> scheduler-509d37ac-296f-4429-b101-af433c1800e9@127.0.1.1:39647
> I0224 18:55:49.342300 21386 containerizer.cpp:911] Destroying container 
> 'test_container'
> *** SIGABRT (@0x3e800005bc8) received by PID 23496 (TID 0x2b178f9f0700) from 
> PID 23496; stack trace: ***
>     @     0x2b178c397cb0 (unknown)
>     @     0x2b178c5db0d5 (unknown)
>     @     0x2b178c5de83b (unknown)
>     @           0x87a945 _Abort()
>     @     0x2b1789f610b9 process::childMain()
> I0224 18:55:49.391793 21386 containerizer.cpp:1120] Executor for container 
> 'test_container' has exited
> I0224 18:55:49.400478 21391 process.cpp:2770] Handling HTTP event for process 
> 'metrics' with path: '/metrics/snapshot'
> tests/containerizer_tests.cpp:485: Failure
> Value of: metrics.values["containerizer/mesos/container_destroy_errors"]
>   Actual: 16-byte object <02-00 00-00 17-2B 00-00 E0-86 0E-04 00-00 00-00>
> Expected: 1u
> Which is: 1
> [  FAILED  ] MesosContainerizerDestroyTest.LauncherDestroyFailure (89 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to