[ https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lutful karim updated MESOS-5482: -------------------------------- Attachment: (was: tasks_stuck_after_rebooot.marathon) > mesos task stuck in staging after slave reboot > ---------------------------------------------- > > Key: MESOS-5482 > URL: https://issues.apache.org/jira/browse/MESOS-5482 > Project: Mesos > Issue Type: Bug > Reporter: lutful karim > Attachments: marathon-mesos-masters_after-reboot.log, > mesos-masters_mesos.log, mesos_slaves_after_reboot.log, > tasks_running_before_rebooot.marathon > > > The main idea of mesos/marathon is to sleep well, but after node reboot mesos > task gets stuck in staging for about 4 hours. > To reproduce the issue: > - setup a mesos cluster in HA mode with systemd enabled mesos-master and > mesos-slave service. > - run docker registry (https://hub.docker.com/_/registry/ ) with mesos > constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and > notice that task getting stuck in staging. > Possible workaround: service mesos-slave restart fixes the issue. > OS: centos 7.2 > mesos version: 0.28.1 > marathon: 1.1.1 > zookeeper: 3.4.8 > docker: 1.9.1 dockerAPIversion: 1.21 > error message: > May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013 > 909 slave.cpp:2018] Ignoring kill task > docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor > 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework > 8517fcb7-f2d0-47ad-ae02-837570bef929-0000 is terminating/terminated -- This message was sent by Atlassian JIRA (v6.3.4#6332)