David Robinson created MESOS-712: ------------------------------------ Summary: invalid zhandle state Key: MESOS-712 URL: https://issues.apache.org/jira/browse/MESOS-712 Project: Mesos Issue Type: Bug Affects Versions: 0.14.0 Reporter: David Robinson
{noformat:title=log snippet} 2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 16533ms 2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [192.168.0.1:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 13199ms) I0929 08:58:17.544836 45283 cgroups.cpp:1193] Trying to freeze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 2013-09-29 08:58:30,474:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1141: Calling a watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE 2013-09-29 08:58:30,475:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 16564ms 2013-09-29 08:58:30,475:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT I0929 08:58:30.445508 45282 detector.cpp:251] Trying to create path '/home/mesos/prod/master' in ZooKeeper 2013-09-29 08:58:30,483:45279(0x7f9024e3f940):ZOO_INFO@check_events@1585: initiated connection to server [192.168.0.2:2181] 2013-09-29 08:58:30,488:45279(0x7f9031267940):ZOO_DEBUG@zoo_awexists@2587: Sending request xid=0x5244d598 for path [/home/mesos/prod/master] to 192.168.0.2:2181 2013-09-29 08:58:30,488:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1621: Socket [192.168.0.2:2181] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x340523200364932 has expired. 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1138: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_EXPIRED_SESSION_STATE 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@do_io@317: IO thread terminated 2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT 2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1784: Calling COMPLETION_STAT for xid=0x5244d598 rc=-112 I0929 08:58:30.475751 45283 cgroups.cpp:1232] Successfully froze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 after 1 attempts F0929 08:58:30.492090 45282 detector.cpp:266] Failed to create '/home/mesos/prod/master' in ZooKeeper: invalid zhandle state *** Check failure stack trace: *** I0929 08:58:30.492761 45292 cgroups.cpp:1208] Trying to thaw cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 I0929 08:58:31.144810 45291 cgroups_isolator.cpp:937] Executor thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 terminated with status 9 I0929 08:58:32.791193 45292 cgroups.cpp:1318] Successfully thawed /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 I0929 08:58:33.675348 45298 cgroups_isolator.cpp:1275] Successfully destroyed cgroup mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 I0929 08:58:33.676269 45300 slave.cpp:2158] Executor 'thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f' of framework 201205082337-0000000003-0000 has terminated with signal Killed I0929 08:58:33.678154 45300 slave.cpp:1778] Handling status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 from @0.0.0.0:0 I0929 08:58:33.679175 45288 cgroups_isolator.cpp:700] Asked to update resources for an unknown/killed executor I0929 08:58:33.679201 45300 status_update_manager.cpp:300] Received status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 I0929 08:58:33.680452 45300 status_update_manager.hpp:337] Checkpointing UPDATE for status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 @ 0x7f9035fb562d google::LogMessage::Fail() @ 0x7f9035fb9617 google::LogMessage::SendToLog() @ 0x7f9035fb7f14 google::LogMessage::Flush() I0929 08:58:35.929435 45300 status_update_manager.cpp:351] Forwarding status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 to master@10.42.69.138:5050 @ 0x7f9035fb8146 google::LogMessageFatal::~LogMessageFatal() @ 0x7f9035d1a83f mesos::internal::ZooKeeperMasterDetectorProcess::connected() @ 0x7f9035d1f118 std::tr1::_Function_handler<>::_M_invoke() @ 0x7f9035d21b84 std::tr1::_Function_handler<>::_M_invoke() @ 0x7f9035ea6f84 process::ProcessManager::resume() @ 0x7f9035ea79df process::schedule() @ 0x7f903561083d start_thread @ 0x7f9033ff2f8d clone {noformat} slave exited w/ SIGABRT. Zookeeper connection issue? Should Mesos handle this gracefully? -- This message was sent by Atlassian JIRA (v6.1#6144)