[ https://issues.apache.org/jira/browse/MESOS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sameer Shah updated MESOS-7174: ------------------------------- Affects Version/s: 1.0.1 > Mesos Agent crashes when it is unable to reconnect to zookeeper > --------------------------------------------------------------- > > Key: MESOS-7174 > URL: https://issues.apache.org/jira/browse/MESOS-7174 > Project: Mesos > Issue Type: Bug > Components: agent > Affects Versions: 1.0.1 > Reporter: Sameer Shah > > Mesos agent crashed when it was not able to reconnect to zookeeper. Here are > relevant logs. I have removed hostnames and ip's from the logs and replace > with HOSTNAME_1, IP_1, etc. > {quote} > mesos-agent[23576]: 2017-02-23 > 05:09:36,718:23576(0x7f932ad69700):ZOO_ERROR@handle_socket_error_msg@1666: > Socket [IP1:2181] zk retcode=-7, errno=110(Connection timed out): connection > to IP1:2181 timed out (exceeded timeout by 4ms) > mesos-agent[23576]: I0223 05:09:36.719043 23592 group.cpp:460] Lost > connection to ZooKeeper, attempting to reconnect ... > mesos-agent[23576]: 2017-02-23 > 05:09:43,386:23576(0x7f932ad69700):ZOO_ERROR@handle_socket_error_msg@1666: > Socket [IP2:2181] zk retcode=-7, errno=110(Connection timed out): connection > to IP2:2181 timed out (exceeded timeout by 2ms) > mesos-agent[23576]: W0223 05:09:46.721179 23588 group.cpp:503] Timed out > waiting to connect to ZooKeeper. Forcing ZooKeeper session > (sessionId=300007df99d24f2) expiration > mesos-agent[23576]: I0223 05:09:46.722100 23588 group.cpp:519] ZooKeeper > session expired > mesos-agent[23576]: I0223 05:09:46.722249 23609 detector.cpp:152] Detected a > new leader: None > mesos-agent[23576]: 2017-02-23 > 05:09:46,722:23576(0x7f932e989700):ZOO_INFO@zookeeper_close@2543: Freeing > zookeeper resources for sessionId=0x300007df99d24f2 > mesos-agent[23576]: I0223 05:09:46.722776 23589 > status_update_manager.cpp:174] Pausing sending status updates > mesos-agent[23576]: I0223 05:09:46.722923 23607 slave.cpp:888] Lost leading > master > mesos-agent[23576]: I0223 05:09:46.722960 23607 slave.cpp:927] Detecting new > master > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@726: Client > environment:zookeeper.version=zookeeper C client 3.4.8 > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@730: Client > environment:host.name=HOSTNAME > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@737: Client > environment:os.name=Linux > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@738: Client > environment:os.arch=#### > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@739: Client > environment:os.version=#### > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@747: Client > environment:user.name=(null) > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@755: Client > environment:user.home=/root > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@log_env@767: Client > environment:user.dir=/ > mesos-agent[23576]: 2017-02-23 > 05:09:46,731:23576(0x7f933198f700):ZOO_INFO@zookeeper_init@800: Initiating > client connection, > host=HOSTNAME_1:2181,HOSTNAME_2:2181,HOSTNAME_3:2181,HOSTNAME_4:2181,HOSTNAME_5:2181 > sessionTimeout=10000 watcher=0x7f933f98c300 sessionId=0 sessionPasswd=<null> > context=0x7f92d00406a0 flags=0 > mesos-agent[23576]: W0223 05:10:20.030608 23608 slave.cpp:1480] Ignoring run > task message from master@IP:5050 because it is not the expected master: None > mesos-agent[23576]: 2017-02-23 > 05:10:26,906:23576(0x7f933198f700):ZOO_ERROR@getaddrs@613: getaddrinfo: No > such file or directory > mesos-agent[23576]: F0223 05:10:26.906946 23598 zookeeper.cpp:132] Failed to > create ZooKeeper, zookeeper_init: No such file or directory [2] > mesos-agent[23576]: *** Check failure stack trace: *** > mesos-agent[23576]: @ 0x7f933fefc34d google::LogMessage::Fail() > mesos-agent[23576]: @ 0x7f933fefe08c google::LogMessage::SendToLog() > mesos-agent[23576]: @ 0x7f933fefbf3c google::LogMessage::Flush() > mesos-agent[23576]: @ 0x7f933fefc149 google::LogMessage::~LogMessage() > mesos-agent[23576]: @ 0x7f933fefd0b2 > google::ErrnoLogMessage::~ErrnoLogMessage() > mesos-agent[23576]: @ 0x7f933f98cb88 ZooKeeperProcess::initialize() > mesos-agent[23576]: @ 0x7f933fe8cca1 process::ProcessManager::resume() > mesos-agent[23576]: @ 0x7f933fe8cf57 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > mesos-agent[23576]: @ 0x7f933e3341e0 (unknown) > mesos-agent[23576]: @ 0x7f933e58ddc5 start_thread > mesos-agent[23576]: @ 0x7f933dd9e28d __clone > systemd[1]: mesos-agent.service: main process exited, code=killed, > status=6/ABRT > systemd[1]: Unit mesos-agent.service entered failed state. > systemd[1]: mesos-agent.service failed. > systemd[1]: mesos-agent.service holdoff time over, scheduling restart. > systemd[1]: Started Mesos Agent. > systemd[1]: Starting Mesos Agent... > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)