Benjamin Hindman created MESOS-944: -------------------------------------- Summary: SlaveRecoveryTest/0.MultipleFrameworks hangs forever Key: MESOS-944 URL: https://issues.apache.org/jira/browse/MESOS-944 Project: Mesos Issue Type: Bug Reporter: Benjamin Hindman
[ RUN ] SlaveRecoveryTest/0.MultipleFrameworks Checkpointing executor's forked pid 25426 to '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_BYLQxo/meta/slaves/201401240116-117506058-58690-24316-0/frameworks/201401240116-117506058-58690-24316-0000/executors/53e76af5-7d39-4af2-8139-87fe1a5307b1/runs/a210b425-1fcf-4083-9105-bc53a92e56bb/pids/forked.pid' Fetching resources into '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_BYLQxo/slaves/201401240116-117506058-58690-24316-0/frameworks/201401240116-117506058-58690-24316-0000/executors/53e76af5-7d39-4af2-8139-87fe1a5307b1/runs/a210b425-1fcf-4083-9105-bc53a92e56bb' Registered executor on tw-mbp13-bhindman.local Starting task 53e76af5-7d39-4af2-8139-87fe1a5307b1 sh -c 'sleep 1000' Forked command at 25455 Checkpointing executor's forked pid 25456 to '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_BYLQxo/meta/slaves/201401240116-117506058-58690-24316-0/frameworks/201401240116-117506058-58690-24316-0001/executors/d887c3e0-2e06-4e7f-8600-39cbb712dcbc/runs/11bdfe51-ee6b-499f-9c61-3728135d27d5/pids/forked.pid' Fetching resources into '/tmp/SlaveRecoveryTest_0_MultipleFrameworks_BYLQxo/slaves/201401240116-117506058-58690-24316-0/frameworks/201401240116-117506058-58690-24316-0001/executors/d887c3e0-2e06-4e7f-8600-39cbb712dcbc/runs/11bdfe51-ee6b-499f-9c61-3728135d27d5' Registered executor on tw-mbp13-bhindman.local Starting task d887c3e0-2e06-4e7f-8600-39cbb712dcbc sh -c 'sleep 1000' Forked command at 25485 Re-registered executor on tw-mbp13-bhindman.local Shutting down Killing process tree at pid 25485 Shutting down Killing process tree at pid 25455 Killed the following process trees: [ --- 25485 sleep 1000 ] Killed the following process trees: [ --- 25455 sleep 1000 ] Command terminated with signal Killed: 9 (pid: 25455) 2014-01-24 01:16:57,580:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client ../../src/tests/slave_recovery_tests.cpp:2698: Failure Value of: status2.get().state() Actual: TASK_FAILED Expected: TASK_KILLED 2014-01-24 01:17:00,912:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client Command exited with status 0 (pid: 25320) 2014-01-24 01:17:04,244:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client 2014-01-24 01:17:07,577:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client 2014-01-24 01:17:10,909:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client 2014-01-24 01:17:14,242:24316(0x117004000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:58837] zk retcode=-4, errno=61(Connection refused): server refused to accept the client ... hangs printing last line forever -- This message was sent by Atlassian JIRA (v6.1.5#6160)