Thomas Rampelberg created MESOS-1473:
----------------------------------------
Summary: Mesos slave becomes unresponsive after launching 8
external containers
Key: MESOS-1473
URL: https://issues.apache.org/jira/browse/MESOS-1473
Project: Mesos
Issue Type: Bug
Components: containerization
Affects Versions: 0.19.0
Reporter: Thomas Rampelberg
After 8 tasks have been launched by marathon/mesos that use external
containerizer, the mesos slave becomes unresponsive and is eventually removed
from the master (requiring a restart of the process).
Replication steps:
:; git clone [email protected]:mesosphere/playa-mesos.git
:; cd playa-mesos
:; vagrant up
:; vagrant ssh
:; sudo mkdir -p /etc/mesos-slave
:; sudo mkdir -p /etc/mesos-master
:; echo /usr/bin/deimos | sudo dd of=/etc/mesos-slave/containerizer_path
:; echo external | sudo dd of=/etc/mesos-slave/isolation
:; curl -H "Content-Type: application/json" -X POST localhost:8080/v2/apps -d
'{"id": "sleep", "cmd": "while true; do sleep 10;
done","instances":8,"cpus":0.1,"mem":16.0}'
Once the 8 instances are done launching (you can verify via. `docker ps`),
mesos slave will be completely unresponsive.
Here's a gist with the slave logs during the event:
https://gist.github.com/pyronicide/9dc68332a29faf38c890
--
This message was sent by Atlassian JIRA
(v6.2#6252)