[ 
https://issues.apache.org/jira/browse/MESOS-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960511#comment-13960511
 ] 

Ian Downes commented on MESOS-1154:
-----------------------------------

 I cannot reproduce this yet but did notice that I don't get ZK errors logged 
for either test; are they expected?

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN      ] SlaveRecoveryTest/0.ReconcileKillTask
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0404 22:26:09.061580 52881 exec.cpp:131] Version: 0.19.0
I0404 22:26:09.064916 52906 exec.cpp:205] Executor registered on slave 
20140404-222607-1828659978-41600-52820-0
Registered executor on smfd-atr-11-sr1.devel.twitter.com
Starting task 3084d139-253d-435b-9105-e6fdb6ccb01e
sh -c 'sleep 1000'
Forked command at 52924
I0404 22:26:09.391057 52908 exec.cpp:251] Received reconnect request from slave 
20140404-222607-1828659978-41600-52820-0
I0404 22:26:09.391815 52911 exec.cpp:228] Executor re-registered on slave 
20140404-222607-1828659978-41600-52820-0
Re-registered executor on smfd-atr-11-sr1.devel.twitter.com
Shutting down
Sending SIGTERM to process tree at pid 52924
Killing the following process trees:
[
--- 52924 sleep 1000
]
Command terminated with signal Terminated (pid: 52924)
[       OK ] SlaveRecoveryTest/0.ReconcileKillTask (5196 ms)
[----------] 1 test from SlaveRecoveryTest/0 (5197 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (5212 ms total)
[  PASSED  ] 1 test.

  YOU HAVE 1 DISABLED TEST
{noformat}

> Flaky SlaveRecoveryTest test: ReconcileKillTask and 
> RecoverStatusUpdateManager.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-1154
>                 URL: https://issues.apache.org/jira/browse/MESOS-1154
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>            Assignee: Ian Downes
>
> Looks like the test tear down is failing to remove a cgroup in both cases:
> {noformat: title=SlaveRecoveryTest/0.ReconcileKillTask}
> [ RUN      ] SlaveRecoveryTest/0.ReconcileKillTask
> 2014-03-27 
> 22:32:49,330:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0327 22:32:50.850909 53927 exec.cpp:131] Version: 0.19.0
> I0327 22:32:50.853888 53953 exec.cpp:205] Executor registered on slave 
> 20140327-223247-1740121354-49087-44864-0
> Registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Starting task bc4f5f79-088e-4188-b9b5-3585ba5e6a98
> sh -c 'sleep 1000'
> Forked command at 53967
> 2014-03-27 
> 22:32:52,667:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-03-27 
> 22:32:56,004:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0327 22:32:56.032625 53957 exec.cpp:251] Received reconnect request from 
> slave 20140327-223247-1740121354-49087-44864-0
> I0327 22:32:56.033253 53945 exec.cpp:228] Executor re-registered on slave 
> 20140327-223247-1740121354-49087-44864-0
> Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Shutting down
> Sending SIGTERM to process tree at pid 53967
> Killing the following process trees:
> [
> --- 53967 sleep 1000
> ]
> Command terminated with signal Terminated (pid: 53967)
> 2014-03-27 
> 22:32:59,341:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/mesos.cpp:387: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): 
> 'mesos_test_cd6a76e9-9961-40ba-b0ed-b1190954975f/0b1f90b8-b5a2-43e9-851e-93ca56cb37d0'
>  is not a valid cgroup
> [  FAILED  ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (13600 ms)
> {noformat}
> {noformat: title=SlaveRecoveryTest/0.RecoverStatusUpdateManager}
> [ RUN      ] SlaveRecoveryTest/0.RecoverStatusUpdateManager
> 2014-03-27 
> 22:30:12,509:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-03-27 
> 22:30:15,845:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0327 22:30:18.038733 52597 exec.cpp:131] Version: 0.19.0
> I0327 22:30:18.043248 52620 exec.cpp:205] Executor registered on slave 
> 20140327-223012-1740121354-49087-44864-0
> Registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Starting task 4c8dda64-17e8-4390-8f31-f878cdde8228
> sh -c 'sleep 1000'
> Forked command at 52637
> 2014-03-27 
> 22:30:19,182:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-03-27 
> 22:30:22,519:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0327 22:30:24.087932 52634 exec.cpp:251] Received reconnect request from 
> slave 20140327-223012-1740121354-49087-44864-0
> I0327 22:30:24.088851 52635 exec.cpp:228] Executor re-registered on slave 
> 20140327-223012-1740121354-49087-44864-0
> Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
> 2014-03-27 
> 22:30:25,855:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0327 22:30:26.091655 52614 exec.cpp:378] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 52637
> Killing the following process trees:
> [
> --- 52637 sleep 1000
> ]
> Command terminated with signal Terminated (pid: 52637)
> ../../src/tests/mesos.cpp:387: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in 
> nested cgroups: Collect failed: 
> 'mesos_test_87c28b05-8055-407a-8ef5-eda7febd1f1c/302e57d9-e837-46cd-bd81-0e39c5b19564'
>  is not a valid cgroup
> [  FAILED  ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam 
> = mesos::internal::slave::MesosContainerizer (16304 ms)
> {noformat}
> Seems to be flaky and only occurring sometimes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to