[ 
https://issues.apache.org/jira/browse/MESOS-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492384#comment-15492384
 ] 

Greg Mann commented on MESOS-6166:
----------------------------------

[~alexr], any thoughts?

> SlaveTest.CommandTaskWithKillPolicy is flaky
> --------------------------------------------
>
>                 Key: MESOS-6166
>                 URL: https://issues.apache.org/jira/browse/MESOS-6166
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>            Reporter: Greg Mann
>              Labels: mesosphere, tests
>
> Observed on our internal CI:
> {code}
> [02:56:36] :   [Step 10/10] [ RUN      ] SlaveTest.CommandTaskWithKillPolicy
> [02:56:51] :   [Step 10/10] ../../src/tests/slave_tests.cpp:692: Failure
> [02:56:51] :   [Step 10/10] Failed to wait 15secs for offers
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672449 27243 slave.cpp:4064] 
> Received ping from slave-observer(385)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672456 27249 hierarchical.cpp:476] 
> Added agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 
> (ip-172-30-2-84.ec2.internal.mesosphere.io) with cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] (allocated: )
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672490 27243 slave.cpp:1111] 
> Registered with master master@172.30.2.84:38327; given agent ID 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672508 27243 fetcher.cpp:86] 
> Clearing fetcher cache
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672577 27249 
> hierarchical.cpp:1770] No inverse offers to send out!
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672591 27249 
> hierarchical.cpp:1294] Performed allocation for agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0 in 119762ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672610 27249 replica.cpp:537] 
> Replica received write request for position 4 from 
> __req_res__(4916)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672703 27248 
> status_update_manager.cpp:184] Resuming sending status updates
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672725 27243 slave.cpp:1134] 
> Checkpointing SlaveInfo to 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/meta/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/slave.info'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672788 27247 master.cpp:6063] 
> Sending 1 offers to framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 
> (default) at scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672894 27243 slave.cpp:1171] 
> Forwarding total oversubscribed resources 
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672914 27249 leveldb.cpp:341] 
> Persisting action (16 bytes) to leveldb took 288811ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672917 27247 sched.cpp:917] 
> Scheduler::resourceOffers took 47432ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672927 27249 replica.cpp:708] 
> Persisted action TRUNCATE at position 4
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.672962 27247 master.cpp:5340] 
> Received update of agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io) with 
> total oversubscribed resources 
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673025 27244 hierarchical.cpp:540] 
> Agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 
> (ip-172-30-2-84.ec2.internal.mesosphere.io) updated with oversubscribed 
> resources  (total: cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000])
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673064 27244 
> hierarchical.cpp:1675] No allocations performed
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673077 27244 
> hierarchical.cpp:1770] No inverse offers to send out!
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673092 27244 
> hierarchical.cpp:1294] Performed allocation for agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0 in 45739ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673312 27247 replica.cpp:691] 
> Replica received learned notice for position 4 from @0.0.0.0:0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673512 27244 master.cpp:3363] 
> Processing ACCEPT call for offers: [ 3936e672-068f-4f3e-9bcc-879e77b45457-O0 
> ] on agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io) for 
> framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 (default) at 
> scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673538 27244 master.cpp:2985] 
> Authorizing framework principal 'test-principal' to launch task 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673640 27247 leveldb.cpp:341] 
> Persisting action (18 bytes) to leveldb took 303466ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673673 27247 leveldb.cpp:399] 
> Deleting ~2 keys from leveldb took 15435ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.673688 27247 replica.cpp:708] 
> Persisted action TRUNCATE at position 4
> [02:58:29]W:   [Step 10/10] W0915 02:56:35.673940 27249 validation.cpp:916] 
> Executor 'default' for task '1' uses less CPUs (None) than the minimum 
> required (0.01). Please update your executor, as this will be mandatory in 
> future releases.
> [02:58:29]W:   [Step 10/10] W0915 02:56:35.673957 27249 validation.cpp:928] 
> Executor 'default' for task '1' uses less memory (None) than the minimum 
> required (32MB). Please update your executor, as this will be mandatory in 
> future releases.
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.674020 27249 master.cpp:7809] 
> Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000] on agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 
> (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.674085 27249 master.cpp:3963] 
> Launching task 1 of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 
> (default) at scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327 
> with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] 
> on agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.674226 27244 slave.cpp:1535] Got 
> assigned task '1' for framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.674393 27244 slave.cpp:1692] 
> Launching task '1' for framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.674649 27244 paths.cpp:536] Trying 
> to chown 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000/executors/default/runs/976aae94-933b-4fa9-aa45-a7421e2a6c78'
>  to user 'root'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.679693 27244 slave.cpp:6089] 
> Launching executor 'default' of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 with resources  in work directory 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000/executors/default/runs/976aae94-933b-4fa9-aa45-a7421e2a6c78'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680332 27244 exec.cpp:162] 
> Version: 1.1.0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680429 27247 exec.cpp:212] 
> Executor started at: executor(116)@172.30.2.84:38327 with pid 27228
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680485 27244 slave.cpp:1978] 
> Queued task '1' for executor 'default' of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680512 27244 slave.cpp:864] 
> Successfully attached file 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000/executors/default/runs/976aae94-933b-4fa9-aa45-a7421e2a6c78'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680882 27248 slave.cpp:4467] 
> Executor 'default' of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 
> exited with status 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.680948 27248 slave.cpp:3581] 
> Handling status update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 from @0.0.0.0:0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681107 27244 master.cpp:5603] 
> Executor 'default' of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 on 
> agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at slave(407)@172.30.2.84:38327 
> (ip-172-30-2-84.ec2.internal.mesosphere.io): exited with status 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681135 27244 master.cpp:7312] 
> Removing executor 'default' with resources  of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 on agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at slave(407)@172.30.2.84:38327 
> (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681324 27244 sched.cpp:1127] 
> Executor default on agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 exited with 
> status 0
> [02:58:29] :   [Step 10/10] ../../src/tests/slave_tests.cpp:687: Failure
> [02:58:29] :   [Step 10/10] Actual function call count doesn't match 
> EXPECT_CALL(sched, resourceOffers(&driver, _))...
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681334 27247 
> status_update_manager.cpp:323] Received status update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29] :   [Step 10/10]          Expected: to be called once
> [02:58:29] :   [Step 10/10]            Actual: never called - unsatisfied and 
> active
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681355 27247 
> status_update_manager.cpp:500] Creating StatusUpdate stream for task 1 of 
> framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29] :   [Step 10/10] ../../src/tests/slave_tests.cpp:684: Failure
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681344 27244 sched.cpp:1138] 
> Scheduler::executorLost took 8086ns
> [02:58:29] :   [Step 10/10] Actual function call count doesn't match 
> EXPECT_CALL(sched, registered(&driver, _, _))...
> [02:58:29] :   [Step 10/10]          Expected: to be called once
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681494 27247 
> status_update_manager.cpp:377] Forwarding update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 to the agent
> [02:58:29] :   [Step 10/10]            Actual: never called - unsatisfied and 
> active
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681586 27243 slave.cpp:3982] 
> Forwarding the update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 to master@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681653 27243 slave.cpp:3876] 
> Status update manager successfully handled status update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681710 27248 master.cpp:5475] 
> Status update TASK_FAILED (UUID: 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for 
> task 1 of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 from agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at slave(407)@172.30.2.84:38327 
> (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681733 27248 master.cpp:5537] 
> Forwarding status update TASK_FAILED (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681777 27248 master.cpp:7187] 
> Updating the state of task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 (latest state: TASK_FAILED, status 
> update state: TASK_FAILED)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681849 27247 sched.cpp:1025] 
> Scheduler::statusUpdate took 28046ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681920 27243 
> hierarchical.cpp:1003] Recovered cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000], allocated: ) on agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0 from framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.681978 27248 master.cpp:4598] 
> Processing ACKNOWLEDGE call 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29 for task 1 
> of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 (default) at 
> scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327 on agent 
> 3936e672-068f-4f3e-9bcc-879e77b45457-S0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682006 27248 master.cpp:7283] 
> Removing task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000] of framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 
> on agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682154 27247 
> status_update_manager.cpp:395] Received status update acknowledgement (UUID: 
> 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682191 27247 
> status_update_manager.cpp:531] Cleaning up status update stream for task 1 of 
> framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682257 27247 slave.cpp:2928] 
> Status update manager successfully handled status update acknowledgement 
> (UUID: 7ac8ee80-b00c-4e0a-81c7-4f954fcb4f29) for task 1 of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682271 27247 slave.cpp:6453] 
> Completing task 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682278 27247 slave.cpp:4571] 
> Cleaning up executor 'default' of framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682389 27244 gc.cpp:55] Scheduling 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000/executors/default/runs/976aae94-933b-4fa9-aa45-a7421e2a6c78'
>  for gc 6.99999210251259days in the future
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682394 27247 slave.cpp:4659] 
> Cleaning up framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682437 27244 gc.cpp:55] Scheduling 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000/executors/default'
>  for gc 6.99999210201482days in the future
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682467 27243 
> status_update_manager.cpp:285] Closing status update streams for framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682519 27249 gc.cpp:55] Scheduling 
> '/mnt/teamcity/temp/buildTmp/SlaveTest_RemoveUnregisteredTerminatedExecutor_8RR39P/slaves/3936e672-068f-4f3e-9bcc-879e77b45457-S0/frameworks/3936e672-068f-4f3e-9bcc-879e77b45457-0000'
>  for gc 6.99999210104days in the future
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682574 27228 sched.cpp:1987] Asked 
> to stop the driver
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682615 27243 sched.cpp:1187] 
> Stopping framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682730 27242 master.cpp:6639] 
> Processing TEARDOWN call for framework 
> 3936e672-068f-4f3e-9bcc-879e77b45457-0000 (default) at 
> scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682749 27242 master.cpp:6651] 
> Removing framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 (default) at 
> scheduler-bbcfae84-a1b9-4103-9538-2872bc778326@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682840 27243 hierarchical.cpp:380] 
> Deactivated framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.682911 27246 slave.cpp:2496] Asked 
> to shut down framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000 by 
> master@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] W0915 02:56:35.682929 27246 slave.cpp:2511] 
> Cannot shut down unknown framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683055 27248 hierarchical.cpp:331] 
> Removed framework 3936e672-068f-4f3e-9bcc-879e77b45457-0000
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683274 27244 slave.cpp:783] Agent 
> terminating
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683343 27244 master.cpp:1254] 
> Agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at slave(407)@172.30.2.84:38327 
> (ip-172-30-2-84.ec2.internal.mesosphere.io) disconnected
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683362 27244 master.cpp:2789] 
> Disconnecting agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683380 27244 master.cpp:2808] 
> Deactivating agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 at 
> slave(407)@172.30.2.84:38327 (ip-172-30-2-84.ec2.internal.mesosphere.io)
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.683495 27249 hierarchical.cpp:569] 
> Agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0 deactivated
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.684293 27228 master.cpp:1097] 
> Master terminating
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.684494 27248 hierarchical.cpp:508] 
> Removed agent 3936e672-068f-4f3e-9bcc-879e77b45457-S0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.695981 27228 cluster.cpp:157] 
> Creating default 'local' authorizer
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.698720 27228 leveldb.cpp:174] 
> Opened db in 2.626782ms
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699134 27228 leveldb.cpp:181] 
> Compacted db in 395465ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699151 27228 leveldb.cpp:196] 
> Created db iterator in 3504ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699157 27228 leveldb.cpp:202] 
> Seeked to beginning of db in 450ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699163 27228 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 374ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699175 27228 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699378 27249 recover.cpp:451] 
> Starting replica recovery
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699470 27246 recover.cpp:477] 
> Replica is in EMPTY status
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699770 27244 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(4917)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.699836 27249 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700047 27242 recover.cpp:568] 
> Updating replica status to STARTING
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700467 27243 master.cpp:380] 
> Master 853beb33-d144-4ede-93e3-0b9a7440469b 
> (ip-172-30-2-84.ec2.internal.mesosphere.io) started on 172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700481 27243 master.cpp:382] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BZW90x/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/BZW90x/master" --zk_session_timeout="10secs"
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700584 27243 master.cpp:432] 
> Master only allowing authenticated frameworks to register
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700592 27243 master.cpp:446] 
> Master only allowing authenticated agents to register
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700595 27243 master.cpp:459] 
> Master only allowing authenticated HTTP frameworks to register
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700600 27243 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/BZW90x/credentials'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700675 27243 master.cpp:504] Using 
> default 'crammd5' authenticator
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700711 27243 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700765 27243 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700822 27243 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700865 27243 master.cpp:584] 
> Authorization enabled
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700903 27246 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 789929ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700917 27246 replica.cpp:320] 
> Persisted replica status to STARTING
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700934 27249 hierarchical.cpp:149] 
> Initialized hierarchical allocator process
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700974 27245 
> whitelist_watcher.cpp:77] No whitelist given
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.700975 27246 recover.cpp:477] 
> Replica is in STARTING status
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701280 27242 replica.cpp:673] 
> Replica in STARTING status received a broadcasted recover request from 
> __req_res__(4918)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701423 27248 recover.cpp:197] 
> Received a recover response from a replica in STARTING status
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701551 27247 recover.cpp:568] 
> Updating replica status to VOTING
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701741 27246 master.cpp:1855] 
> Elected as the leading master!
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701753 27246 master.cpp:1556] 
> Recovering from registrar
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701833 27242 registrar.cpp:332] 
> Recovering registrar
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701894 27247 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 291737ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.701907 27247 replica.cpp:320] 
> Persisted replica status to VOTING
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702080 27246 recover.cpp:582] 
> Successfully joined the Paxos group
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702124 27246 recover.cpp:466] 
> Recover process terminated
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702265 27244 log.cpp:553] 
> Attempting to start the writer
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702630 27247 replica.cpp:493] 
> Replica received implicit promise request from 
> __req_res__(4919)@172.30.2.84:38327 with proposal 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702941 27247 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 295608ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.702953 27247 replica.cpp:342] 
> Persisted promised to 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.703200 27243 coordinator.cpp:238] 
> Coordinator attempting to fill missing positions
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.703621 27248 replica.cpp:388] 
> Replica received explicit promise request from 
> __req_res__(4920)@172.30.2.84:38327 for position 0 with proposal 2
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.703910 27248 leveldb.cpp:341] 
> Persisting action (8 bytes) to leveldb took 268930ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.703923 27248 replica.cpp:708] 
> Persisted action NOP at position 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.704310 27249 replica.cpp:537] 
> Replica received write request for position 0 from 
> __req_res__(4921)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.704336 27249 leveldb.cpp:436] 
> Reading position from leveldb took 9059ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.704653 27249 leveldb.cpp:341] 
> Persisting action (14 bytes) to leveldb took 298800ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.704665 27249 replica.cpp:708] 
> Persisted action NOP at position 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.704900 27242 replica.cpp:691] 
> Replica received learned notice for position 0 from @0.0.0.0:0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705199 27242 leveldb.cpp:341] 
> Persisting action (16 bytes) to leveldb took 276010ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705210 27242 replica.cpp:708] 
> Persisted action NOP at position 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705354 27247 log.cpp:569] Writer 
> started with ending position 0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705670 27242 leveldb.cpp:436] 
> Reading position from leveldb took 10561ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705876 27247 registrar.cpp:365] 
> Successfully fetched the registry (0B) in 4.022784ms
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.705904 27247 registrar.cpp:464] 
> Applied 1 operations in 2213ns; attempting to update the registry
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706089 27243 log.cpp:577] 
> Attempting to append 231 bytes to the log
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706149 27244 coordinator.cpp:348] 
> Coordinator attempting to write APPEND action at position 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706392 27242 replica.cpp:537] 
> Replica received write request for position 1 from 
> __req_res__(4922)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706709 27242 leveldb.cpp:341] 
> Persisting action (250 bytes) to leveldb took 299260ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706722 27242 replica.cpp:708] 
> Persisted action APPEND at position 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.706965 27243 replica.cpp:691] 
> Replica received learned notice for position 1 from @0.0.0.0:0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707236 27243 leveldb.cpp:341] 
> Persisting action (252 bytes) to leveldb took 253829ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707248 27243 replica.cpp:708] 
> Persisted action APPEND at position 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707453 27242 registrar.cpp:509] 
> Successfully updated the registry in 1.531904ms
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707497 27244 log.cpp:596] 
> Attempting to truncate the log to 1
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707515 27242 registrar.cpp:395] 
> Successfully recovered registrar
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707559 27244 coordinator.cpp:348] 
> Coordinator attempting to write TRUNCATE action at position 2
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707633 27243 master.cpp:1664] 
> Recovered 0 agents from the registry (192B); allowing 10mins for agents to 
> re-register
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707695 27244 hierarchical.cpp:176] 
> Skipping recovery of hierarchical allocator: nothing to recover
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.707904 27242 replica.cpp:537] 
> Replica received write request for position 2 from 
> __req_res__(4923)@172.30.2.84:38327
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.708186 27242 leveldb.cpp:341] 
> Persisting action (16 bytes) to leveldb took 264026ns
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.708199 27242 replica.cpp:708] 
> Persisted action TRUNCATE at position 2
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.708447 27242 replica.cpp:691] 
> Replica received learned notice for position 2 from @0.0.0.0:0
> [02:58:29]W:   [Step 10/10] I0915 02:56:35.708708 27242 leveldb.cpp:341] 
> Persisting action (18 bytes) to leveldb took 240424ns
> [02:58:29] :   [Step 10/10] [  FAILED  ] SlaveTest.CommandTaskWithKillPolicy 
> (113708 ms)
> {code}
> Note that this showed up at about the same time as MESOS-6165, which looks 
> like a similar failure. Perhaps related?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to