Greg Mann created MESOS-9918:
--------------------------------
Summary: Agent fails to scale many tasks/containers with command
health checks
Key: MESOS-9918
URL: https://issues.apache.org/jira/browse/MESOS-9918
Project: Mesos
Issue Type: Task
Components: agent, containerization
Reporter: Greg Mann
When ~50 containers are launched simultaneously in a task group on an agent,
all of which specify command health checks, they will fail to become healthy.
The {{LAUNCH_NESTED_CONTAINER_SESSION}} calls for the health checks time out,
leading to task group failure.
We should both investigate the cause of the timeouts (based on previous
profiling efforts, it is likely due to the cost of forking from the agent
process), as well as consider rate-limiting options to allow operators to
simultaneously scale large numbers of containers.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)